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Abstract 


y  The  modelling  and  segmentation  of  images  by  MRF's  (Harkov  Random 
Fields)  is  treated.  These  are  two-dimensional  noncausal  Markovian 


Stochastic  Processes.  Two  conceptually  new  algorithms  are  presented  for 
segmenting  textured  images  into  regions  in  each  of  which  the  data  is 
modelled  as  one  of  C  MRF's.  The  algorithms  are  designed  to  operate  in  real 
time  when  implemented  on  new  parallel  computer  architectures  that  can  be 
built  with  present  technology.  A  doubly  stochastic ■ representation  is  used 
in  image  modelling.  Here,  a  Gaussian  MRF  is  used  to  model  textures  in 
visible  light  and  infrared  images,  and  an  auto-binary  MRF  to  model  a  priori 
information  about  local  image  geometry.  Image  segmentation  is -realized  as 
maximum  likelihood  estimation.  In  addition  to  providing  a  mathematically 
correct  means  for_JLntroducing  geometric  structure,  the  auto-binary^ (or 
ternary,  etc.)  MRF  can  be  used  in  a  generative  mode  to  generate ^.mage 
geometries  and  artificial  images,  and  such  simulations  constitute  a  very 
powerful  tool  for  studying  the  effects  of  these  models  and  the  appropriate 
choice  of  model  parameters.  The  first  segmentation  algorithm  is 
hierarchical  and  uses  a  pyramid-like  structure  in  new  ways  that  exploit  the 
mutual  dependencies  among  disjoint  pieces  of  a  textured  region.  The  second 
segmentation  algorithm  is  a  relaxation-type  algorithm  that  arises  naturally 
within  the  context  of  these  noncausal  MRF's.  It  is  a  simple,  maximum 
likelihood  estimator.  The  algorithms  can  be  used  separately  or  together. 
The  algorithms  have  the  desirable  properties:  (i)  the  required  computation 
appears  to  be  close  to  the  minimum  required  for  segmenting  images  modelled 
by  MRF's;  (ii)  the  segmentation  can  operate  in  adaptive  modes,  estimating  a 
priori  unknown  fixed  or  spatially  varying  MRF  parameters;  (iii)  the 

segmentations  are  unusually  accurate -  usually  to  within  one  or  two  pixels 

in  the  experiments  run.  A  contribution  of  the  paper  is  that  a  start  is 
made  in  discussing  and  exploring  that  inherent  structure  of  MRF's  that  must 
be  exploited  in  order  to  construct  physically  meaningful  MRF  models  for 
representing  real  textured  images,  and  in  order  to  devise  segmentation 
algorithms  that  are  computationally  efficient. 


History  of  the  Textured  Image  Segmentation  Problem 


The  use  of  Markovian  Random  Fields  to  model  image  textures,  textured 
region  geometry,  and  textured  region  boundary,  and  then  the  application  of 
maximum- likelihood  or  Bayesian  estimation  methods  for  segmenting  such  images 
has  a  growing  history.  Cooper,  et.  al.  [1],  have  used  white  Gaussian  fields 
to  represent  texture,  and  unilateral  (causal)  Markov  processes  to  represent 
region  boundaries.  Image  segmentation  was  realized  as  the  maximization  of  the 
conditional  likelihood  of  a  boundary  given  the  data  image.  In  [2] ,  Cooper, 
et  al.,  used  the  same  representation  and  developed  an  approach  that 
partitions  an  image  into  small  square  windows,  does  boundary  estimation 
simultaneously  in  all  windows  through  the  use  of  dynamic  programming,  then 
seams  the  windows  together  using  again  dynamic  programming.  Again,  using  the 
same  representation,  Schenker,  et.  al.  [3],  demonstrated  the  hierarchical 
"ripple  filter",  a  local  relaxation-type  algorithm  to  do  maximum  likelihood 
image  segmentation.  Elliott  and  Scharf  [4]  were  the  first  to  develop  a 
dynamic  programming  algorithm  for  boundary  estimation,  and  then  Elliott, 
et.al.  [5]  developed  a  dynamic  programming  algorithm  for  segmenting  multi¬ 
level  texture.  Here  a  white  Gaussian  field  with  different  means  (for  the 
different  textured  regions)  was  used  to  model  texture,  and  a  MRF  was  used  to 
model  region  geometry.  This  doubly-stochastic  modelling  for  texture  and 
region  geometry  was  also  adopted  by  Thernen  [6] .  He  used  a  white  or  a  causal 
colored  texture  field  for  describing  the  texture  of  the  different  regions  in 
the  image,  and  a  binary  field  based  on  a  2D  Markov  chain  introduced  by  Kaufman 
and  Woods  [7]  to  model  prior  information  about  texture  region  connectivity. 

The  image  segmentation  algorithm  he  used  was  an  iterative  one  where  the 
texture  region  association  of  a  pixel  (i,j)  was  based  on  the  relative  numbers 
of  pixels  assigned  to  the  2  texture  types  in  a  small  square  about  (i,j)  and  on 
its  texture  conditional  likelihood.  Later,  maximum  a  posteriori  estimation 


(MAP)  of  an  original  image  given  the  degraded  observations  through  stochastic 
relaxation  was  also  adopted  by  the  Gemans  [8] .  They  viewed  the  image  as  a 
pair  of  processes  —  the  intensity  process  and  the  line  process.  These  were 
modelled  by  MRF's  (with  a  relatively  small  neighborhood)  whose  Gibbs  potential 
was  assumed  known.  Image  restoration  was  achieved  through  an  annealing 
schedule  that  forces  samples  from  the  posterior  distribution  towards  the 
minimal  energy  configuration  (i.e.  the  maximum  of  the  posterior  distribution). 

Finally,  we  mention  three  lines  of  research,  not  because  they  deal  with  the 
segmentation  of  MRF's,  but  rather  because  they  use  algorithms  close  in  spirit 
to  those  that  we  discuss.  Chen  and  Pavlidis  [9]  presented  a  hierarchical 
approach  to  textured  image  segmentation  involving  image  data  modelled  by 
noncausal  2-D  Gaussian  processes.  No  use  was  made  of  MRF's  or  their 
properties.  They  expressed  the  segmentation  problem  as  a  sequence  of  tests  of 
hypotheses  on  a  quadtree  data  base  and  within  the  framework  of  a  split-and- 
merge  algorithm.  Regions  of  arbitrary  initial  segmentation  were  tested  for 
uniformity.  If  they  were  not  uniform  they  were  subdivided  into  smaller 
regions,  or  set  aside  if  the  appropriate  statistics  were  below  a  given 
threshold.  Subject  to  cluster  analysis,  similar  uniform  regions  were  merged 
as  constituting  a  texture-type  region.  Updated  estimates  for  the  parameters 
of  each  random  field  were  obtained  as  the  uniform  regions  became  larger. 

These  were  in  turn  used  to  classify  some  of  the  remaining  unclassified  small 
regions.  Rosenfeld  and  colleagues  have  developed  'relaxation'  algorithms  for 
parallel  processing.  Though  they  do  not  do  maximum  likelihood  estimation  nor 
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do  they  use  the  image  data  in  iterations  following  the  initial  segmentation, 

% 

nevertheless,  they  have  shown  that  their  approach  involving  "pseudo-likelihoods"  caf^-V-V, 
used  in  many  important  applications,  e.g.,  see  [10] .Faugeras  and  Berthod  [ll]  deve lS'C'-’y^? 
a  relaxation-like  algorithm  but  based  on  the  extremization  of  a  performance  func t io^*^*-"-^ 

.  _  JfL, 

Their  application  was  object  classification  based  on  multi-spectral  data.  Hinton, 
et.al.  [25],  are  developing  elements  of  an  approach  to  scene  understanding  based  oik^/V/ 
networks,  an  energy  performance  functional  ,  and  a  probabilistic  relaxation 
optimization  algorithm.  Though  the  referenced  papers  do  not  deal  with  textured  ima>r-‘'v- 
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segmentation,  there  is  a  significant  overlap  in  the  kinds  of  equations  that  they  and  we 
ultimately  deal  with. 

The  first  published  papers  on  maximum  likelihood  segmentation  of  textured 
images  when  noncausal  Gaussian  MRF's  are  used  for  data  modelling  and  another 
MRF  is  used  for  texture-type  region  modelling  are  those  of  Cohen,  et  al.  [12, 

13,  14].  Hierarchical  and  parallel  iterative  (relaxation-type)  segmentation 
algorithms  were  introduced,  and  so  was  the  use  of  these  in  combination. 

1 . 1  Brief  Overview  of  our  Use  of  Windows.  Our  Approach  to  Textured  Image 
Segmentation,  and  the  History  of  Markov  Random  Fields  for  Texture  Modelling. 

We  model  textured  images  on  two  levels,  one  unobserved  level  to  describe 
the  geometry  of  the  basic  regions  within  each  of  which  the  image  has  one  type 
of  texture,  and  an  observable  level  to  describe  the  textured  image  data  in 
each  region.  Under  this  scenario  for  image  generation,  nature  first  generates 
the  texture-type  regions,  and  then  fills  in  each  region  with  an  image  having 
the  appropriate  texture.  Two-dimensional  MRF's  are  used  as  texture  and  region 
models.  The  MRF  models  are  a  class  of  parametric  models  that  have  been 
studied  extensively  by  Besag  [  15 ]  and  Bartlett  [  16 ]  as  a  generalization  to 
the  Ising  model.  Gaussian  MRF's  have  been  extensively  studied  by  Woods  [17] 
and  Jain  [18],  among  others.  Woods  has  considered  them  for  image  filtering 
[19].  They  have  been  successfully  used  by  Cross  [20],  and  Kashyap  and 
[21],  among  ethers,  to  model  a  variery  of  stationary  texture  fields  such  as 
Brodatz  textures.  Theory  has  been  presented  by  besag  [15],  as  well  as  by  Kashvap  and 
Chelappa  [21]  for  estimating  parameters  to  adapt  the  models  to  stat ionarv 
field  data.  Kashyap' s  and  Chelappa' s  work  [21]  also  included  texture 
recognition  for  rectangular  textured  regions. 

The  segmentation  problem  consists  of  partitioning  a  textured  image  into 
regions,  in  each  of  which  there  is  one  of  C  possible  texture  types  (texture 


classes).  This  means  that  a  texture-type  classification  must  be  made  for  each 
pixel.  Since  a  connected  single  texture-type  region  will  usually  contain  many 
pixels,  we  consider  a  simplification  that  considerably  reduces  the  required 
computation.  The  simplification  is  to  partition  the  image  into  small  square 
windows  and  to  process  each  window  under  the  assumption  that  it  contains  one 
or  at  most  two  texture  types.  Then  each  window  is  processed  as  a  separate 
subimage,  and  the  results  combined  in  an  appropriate  way  (see  Section  4.5). 
Hence,  most  of  this  paper  is  concerned  with  segmenting  a  small  square  window 
of  image  into  two  regions,  each  of  which  comprises  one  of  two  image  texture 
types.  Extension  to  three  or  more  texture  types  in  a  window  is  immediate. 

A  precise  brief  statement  of  our  segmentation  approach  is  the  following. 
Nature  partitions  an  NxN  pixel  window  into  two  (or  more)  region-*  denoted  as 
texture-type  regions  0  and  1.  Within  region  k,  model  k  is  used  to  generate 
the  data.  The  image  data  in  region  0  is  statistically  independent  of  that  in 
region  1.  Let  sjj  be  a  binary  variable  taking  values  0  and  1,  denoting 
texture-type  0  and  1,  respectively,  and  let  S  denote  the  dimensional  vector 
having  components  sj_j.  Then  the  vector  S  specifies  the  partition  of  the  NxN 
pixel  window  into  the  texture-type  0  and  texture-type  1  regions.  Let  y.^ 
denote  the  picture  function  (i.e.,  image  data)  at  pixel  (i,j),  and  Y  be  the  N^ 
component  vector  having  components  yij*  Then  Y  is  the  vector  of  all  picture 
function  values  in  the  NxN  pixel  window.  If  S  is  viewed  as  a  constant  but 
unknown  vector,  the  likelihood  of  the  data  in  the  window  is 

p(Y,S)  =  p(Y0|0)  p(Yx  jl)  (1) 

Here,  p(Y, S)  is  the  liklihood  of  Y  given  the  partition  S.  Yk  denotes  the 
picture  function  vector  for  texture-type  region  k,  and  piYjJk)  denotes  the 
likelihood  of  Y^  given  the  probability  model  for  image  data  in  texture-type 


region  k.  Hence,  the  simplest  texture  segmentation  problem  is  to  determine 
the  partition  S  for  which  Eq.l  is  a  maximum. 


The  second  problem  is  that  in  which  S  is  itself  a  random  vector.  That 
is,  nature  partitions  the  window  into  texture-type  regions  in  accordance  with 
some  probabilistic  process.  Then  the  segmentation  problem  is  the 
determination  of  S  for  which  the  posterior  likelihood 

p(s|y)  (2) 

is  a  maximum.  Note,  equivalent  to  maximizing  (2)  is  finding  the  S  that 
maximizes  (3)  or  (4) 

Jin  p(Y, S)  (3) 

Jin  p(S)  +  Jin  p(Y|s)  (4) 

1.2  Contribution  of  this  Paper 

The  present  paper  is  an  in-depth  treatment  and  extension  of  material 
briefly  introduced  in  [13,14],  It  makes  the  following  contributions.  Certain 
properties  of  MRF's  need  to  be  exploited  in  order  to  do  physically  meaningful 
image  data  modelling  that  is  also  mathematically  consistent.  Some  of  these 
questions  are  raised  and  one  solution  is  presented.  Parallel  iterative  and 
hierarchical  algorithms  are  presented  for  maximum  likelihood  segmentation  of 
textured  images.  Our  iterative  algorithm  will  generally  require  much  less 
computation  than  will  an  annealing  algorithm,  and  will  also  work  with 
nous tatianury  models  and  data,  whereas  the  annealing  algorithm  usually  wi]  1 
not.  Again,  there  are  properties  of  the  MRF's  that  can  be  exploited,  this 
time  to  substantially  reduce  the  amount  of  required  computation.  A  divide- 
and-conquer  approach  is  taken  in  our  hierarchical  segmentation  algorithm  that 
reduces  the  number  of  multiplication-equivalent  operations  to  42,257  for  a 
64x64  pixel  window.  This  is  to  be  compared  with  ten  to  fifteen  times  that 
number  of  multiplication-equivalent  operations  for  a  direct  hierarchical 
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approach.  The  simplest  iterative  algorithm  is  mathematically  correct,  but 
lacks  a  certain  physical  meaningfulness.  This  problem  is  addressed  and  one 
solution  is  briefly  proposed.  A  number  of  experiments  are  run  with  real 
data.  This  turns  out  to  be  important,  as  the  segmentation  of  real  data  is 

much  more  difficult  than  is  that  of  artificially  generated  data - especially 

since  real  image  texture  data  is  generally  nonstationary.  One  approach  to 

parameter-adaptive  segmentation  is  briefly  discussed  and  illustrated.  This 
estimation  of  unknown  model  parameters  during  the  segmentation  process  is 
required  in  practice.  Our  algorithms  appear  to  be  robust,  and  effective  even 
when  the  models  do  not  represent  the  data  well. 

2.0  Markov  Random  Fields  (MRF) 

We  use  the  auto-normal  MRF  (i.e.,  Gaussian  process)  as  the  texture  data 
model,  and  the  auto-binary  process  for  modelling  region  geometry  for  an  image 
window  that  contains  at  most  two  different  texture  types.  Let  r=(i,j)  index 
pixel  location  where  i,j  specify  pixel  row  and  column  location  and  satisfy 
1  <_  i?j  <_  n.  Let  xr  denote  a  random  field,  with  xr  the  field  at  pixel  r,  X 

a  vector  specifying  the  field  over  an  entire  N  x  N  window  and  having 

components  xr,  and  Xr  the  field  everywhere  but  at  pixel  r.  By  {xr}  a  MRF  we 
mean  that 

t 

P<*r|Xr)  -  p(xr|xv,veDp) 

Here,  Dp  denotes  a  neighbor  set,  and  p(xr|xr)  denotes  the  conditional 
likelihood  of  xr  given  Xr. 

The  nature  of  Dp  is  illustrated  in  Fig.  1.  As  indicated  in  Fig.  1,  a  1st 
order  MRF  is  one  for  which  the  neighbor  set  consists  of  the  four  pixels 
constituting  the  north,  south,  east,  and  west  neighbors  of  the  center  pixel.  A 


2nd  order  MRF  is  that  for  which  the  neighbor  set  is  the  first  layer  of  eight 
pixels  surrounding  the  center  pixel,  i.e.,  all  pixels  marked  1  or  2.  A  3rd 
order  neighbor  set  consists  of  all  pixels  marked  1,  2  or  3,  etc.  In  general, 

Dp  =  {v  =(t,m)  such  that  |  | r— v [  f  <_  Np  and  Wr} 

Where  P  is  the  order  of  the  process,  and  Np  is  an  increasing  function  of  P. 

Np  takes  the  values  1,2, 4, 5, 8, 9  for  P=l,2,..,6  respectively. 

2.1  The  Gaussian  MRF  Texture  Model 

The  observed  texture  field  (i.e.,  image  data  field)  is  denoted  y£j  or 
just  yr,  r  =  (i,j).  The  conditional  likelihood  for  the  kth  texture  class  is 
Gaussian,  and  is  given  by 

p(yr  !  Yr  ,  class  k)= 

“  [2iro2(k)l  -5£  exp{-(l/2o2(k))[yr-jj(k)  -  £  8r_v(k) (yv-p(k) )] 2)  (5) 

vcD 

P 

For  {yr }  stationary,  8r-v  must  satisfy  certain  restrictions.  However, 
nonstationary  fields  are  also  of  interest  in  this  paper.  For  a  stationary 
field,  the  joint  probability  density  function  (pdf)  for  the  image  in  texture 
region  k  given  that  the  field  is  0  outside  this  region  is  shown  by  Besag  [ 15 j  to 

p(Y|class  k.)  =  [2iro2(k)]"M^2|3(k)  |1/2exp{-[(Y-U(k))t  B(k)  [Y-U(k)]/2tr2(k>  )  (5a) 

where  U(k)  *  p(k)(l, 1, . . , l)t,  and  Y  is  an  (M  x  1)  column  vector  whose  elements 
are  the  picture  function  yr  at  the  pixels  in  the  M  pixel  region.  B(k)  is  a 
symmetric  positive  definite  (MxM)  matrix  whose  diagonal  elements  are 
unity,  and  whose  off-diagonal  element  is  equal  to  -$r_v,  where  8r- y  indicates 
the  strength  of  the  spatial  interaction  between  the  MRF  at  points  r  and  v. 

$r-v  is  zero  if  points  r  and  v  are  not  neighbors. 


We  use  the  noncausal  binary  process  to  model  region  geometry.  It  has 
been  used  by  others  to  model  image  textures  (Cross  [20]  ).  For  example,  if  one 
texture-type  region  consists  of  small  elongated  blobs  and  another  consists  of 
large  regions,  a  binary  MRF  model  can  be  designed  to  generate  such  patterns. 

Or  if  two  texture-type  regions  are  large  with  boundaries  of  low  curvature,  a 
binary  model  can  be  designed  to  generate  these  patterns.  For  the  local  0,1 
pattern  shown  in  Fig.  2,  if  nature's  goal  was  to  generate  regions  with  long 
smooth  boundaries  she  would  generate  an  s  value  of  0  with  higher  probability 
than  an  s  value  of  1  at  location  x.  Let  S  denote  the  region-field  modelled  by 
the  noncausal  auto-binary  Markov  field  that  describes  the  geometry  of  the 
regions  in  the  image.  For  an  N  x  N  image,  S  has  components  {sij>’  each  s^ 
taking  the  value  0  or  1  and  describing  the  allocation  of  pixel  r  ■  ( i , j )  to 
either  class  I  or  class  II.  The  conditional  probability  takes  the  form 

p(sr|sr)  =  exp  (sr  Tr)/[1  +  exp(Tr)]  (6) 

where  T^is  given  by 

Tr  =  a  +  I  brvsv  (7) 

vcDp 

For  {sij}  stationary,  -  br_v  for  all  r  =»  (i,j)  and  v  »  (w,n). 

3.0  The  Segmentation  Problem 
Overview 

The  window  segmentation  problem  of  concern  in  this  paper  is  that  a  window  may 
contain  two*  texture-type  regions,  each  from  one  of  C  classes,  and  the 
window  is  to  be  partitioned  into  two  regions,  each  consisting  of  one  texture 

*  Three  or  more  are  possible,  but  the  required  computation  is  more  extensive. 
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type.  We  use  maximum  likelihood  estimation  for  this  purpose.  The  two 
difficulties  that  arise  in  the  segmentation  problem  are:  first,  likelihoods 
must  be  computed  for  texture  regions  of  irregular  shape;  second,  likelihoods 
must  be  computed  for  many  of  the  possible  partitionings  of  the  window.  The 
challenge  of  the  first  problem  is  that  there  is  no  obvious  simple  way  of 
computing  the  joint  likelihood  of  a  noncausal  MRF  over  an  irregular  region. 

The  challenge  of  the  second  problem  is  to  be  able  to  compute  the  picture 
function  likelihoods  for  many  different  partitions  of  a  window  without  having 
to  do  a  horrendous  amount  of  computation. 

This  paper  presents  simple  practical  solutions  to  both  problems.  The 
first  algorithm  we  present  is  a  hierarchical  pyramidal-type  algorithm,  whereas 
the  second  one  is  an  iterative  relaxation-type  algorithm.  Both  algorithms  can 
handle  textured  images  modelled  by  either  stationary  or  nonstationary  MRF 
texture  models. 

The  reason  for  partitioning  the  image  into  windows  is  threefold:  1)  it 
permits  parallel  processing,  since  windows  can  be  segmented  simultaneously 
using  appropriate  hardware  architectures,  and  the  results  then  seamed 
together;  2)  by  having  one  or  at  most  two  texture  classes  in  a  window,  the 
processing  is  relatively  simple;  3)  if  the  windows  are  small,  the  texture 
model  within  a  window  can  be  treated  as  being  spatially  stationary,  i.e., 
having  parameters  that  are  constant  for  the  entire  window  (and  satisfying  certain 
restrictions).  This  is  especially  convenient  if,  as  is  usually  the  case,  the  texture 
model  parameters  are  a  priori  partially  unknown  or  are  spatially  slowly  varying  and 
must  be  estimated  during  the  segmentation  process. 
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3. 1  The  Hierarchical  Algorithm. 


In  the  hierarchical  algorithm,  we  often  refer  to  the  two  texture-type 
regions  as  object  and  background  regions.  The  reason  for  this  is  partly 
historical  and  stems  from  the  fact  that  a  goal  is  often  to  isolate  a  specific 
object  without  caring  about  the  interpretation  of  the  surrounding  region. 
Examples  are  an  image  of  a  kidney  in  a  CATSCAN,  a  vehicle  in  an  infrared 
image,  a  tree  in  a  visible  light  image,  etc.  Such  an  object  often  appears  in 
an  image  as  a  convex  textured  blob.  The  intersection  of  a  window  with  such  a 
region  will  be  connected.  Hence,  for  the  hierarchical  algorithm  we  have 
imposed  the  constraint  that  the  estimate  of  one  of  the  textured  regions  in  a 
window,  the  object  region,  be  connected.  Our  algorithm  can  easily  be  modified 
to  require  that  the  estimated  object  and  background  regions  both  be  connected 
regions,  or  to  not  require  connectivity  in  either  type  of  region  estimate. 

The  segmentation  sought  is  that  which  maximizes  the  likelihood  of  the 
data.  In  each  window  the  segmentation  algorithm  is  hierarchical  and  uses  a 
quadtree-like  data  structure.  The  window  is  divided  into  four  quadrants  and 
the  best  segmentation,  i.e,  grouping  of  these  four  blocks,  is  obtained.  Then 
each  of  the  four  blocks  is  itself  partitioned  into  four,  and  the  process  is 
repeated  until  the  block  size  is  such  that  each  block  consists  of  one  pixel. 

At  each  3tage  of  the  hierarchy,  the  object  connectivity  constraint  is  met. 

Each  stage  of  the  algorithm  involves  working  with  blocks  of  a  certain 
size  and  consists  ot  two  steps. 

Step  "Region  growing". 

At  the  end  of  the  nth  stage,  the  window  has  been  segmented  into  two  sets, 
one  of  which  is  referred  to  as  the  object  region  and  is  constrained  to  be 
connected.  For  the  (n+l)st  stage,  we  shrink  and  expand  the  object  by 
partitioning  each  block  in  the  neighborhood  of  the  estimated  boundary  into 
four  and  obtain  the  best  grouping  of  the  four  resulting  quadrants.  There  will 


be  16  partitions  to  consider.  Since  under  the  auto-normal  model  assumption 
neighboring  blocks  are  spatially  interacting  among  each  other  (i.e.,  partially 
correlated  along  their  conmon  boundaries)  the  likelihood  of  each  partition  is 
conditioned  on  the  segmentation  of  the  surrounding  blocks  determined  at  the 
nth  stage.  This  use  of  stochastic  dependence  is  crucial  to  achieving  good 
segmentation.  This  is  especially  true  when  the  blocks  are  of  small  sizes, 
since  a  small  block  may  contain  only  a  portion  of  a  cycle  of  texture  data,  and 
consequently,  texture-type  classification  cannot  be  based  on  the  data  content 
of  the  small  window  alone.  Rather,  use  must  be  made  of  the  continuity  of  the 
texture  data  and  its  first  or  higher  derivatives  at  the  boundary  between  the 
small  window  and  the  surrounding  image  region  of  the  same  texture  type.  Our 
maximum  likelihood  segmentation  algorithms  automatically  include  information 
roughly  equivalent  to  this.  All  object  blocks  constituting  the  boundary  of 
the  object  as  well  as  all  background  blocks  neighboring  the  boundary  blocks 
are  processed  simultaneously. 

Step  2_.  "Object  connectivity" 

We  check  for  the  single-object-region  connectivity  constraint.  If  there 
is  more  than  one  object-region,  we  merge  them  into  one.  The  merging  is  done 
by  computing  the  likelihoods  of  all  possible  configurations  that  will  result 
in  a  single  connected  object-region.  We  have  adopted  the  notion  that  no  pair 
of  regions  be  merged  unless  they  have  at  least  one  common  first-neighbor 
block.  By  first  neighbor  blocks  we  mean  background  blocks  that  are  adjacent 
to  the  boundary  of  the  object.  We  repeat  step  1  and  step  2  at  each  stage  of 
decreasing  block  size  until  each  block  consists  of  one  pixel. 

The  following  example  will  illustrate  the  algorithm  steps  more  clearly. 
Suppose  at  the  end  of  the  2nd  stage  we  obtained  the  segmentation  shown  in 
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Figure  3a.  Note  that  the  object  is  not  connected,  as  it  consists  of  two 
disjoint  regions  Rl,  R2.  Following  step  2  there  will  be  one  of  two  possible 
results.  Either  block  B  is  reallocated  to  the  object  class,  figure  3b,  or  Rl  or 
R2  are  reallocated  to  the  background  class,  figure  3c ,d,  whichever  is  more  likely 
This  results  in  one^^nnected  object  region  R.  We  expand  and  shrink  region  R 
by  partitioning  the  boundary  and  first-neighbor  blocks  as  in  step  1. 


3.2.1  Computational  Considerations  for  Stationary  Gaussian  MRF  Textures 

The  hierarchical  texture  segmentation  algorithm  involves  recognizing  the 
texture  region  association  of  large  blocks  of  image  data  in  the  early  stages 
and  small  blocks  of  image  data  in  the  last  stages  of  image  segmentation.  For 
the  former,  it  is  necessary  to  compute  the  likelihoods  of  large  blocks  of  the 
image.  For  the  latter,  it  is  necessary  to  compute  the  likelihoods  of  small 
blocks  of  data  conditioned  on  their  surrounds.  These  are  normally 
computationally  prohibitive  tasks.  In  this  section  we  will  show  how  to 
practically  compute  the  likelihoods  of  large  blocks  and  the  conditional 
likelihoods  of  small  blocks  of  data  conditioned  on  their  surrounds- 

The  free-boundary  condition  [15]  is  assumed.  The  meaning  of  this 
is  that  if  pixel  (i,j)  is  in  uhe  region  for  which  the  joint  likelihood  of  all 
picture  function  values  is  to  be  computed  and  if  pixel  (i,m)  is  not  in  the 
region  but  i£  in  the  conditioning  neighbor  set  given  in  (5),  the  picture 
function  yt>m  at  (  £,ra)  is  not  to  be  used  in  (5).  (Note,  (i.,ra)  is  in  the 
conditioning  neighbor  set  if  (£.,m)  c  Dp.)  Not  including  yj>in  is  equivalent 
to  setting  it  to  0  in  (5).  The  use  of  this  free  boundary  condition  does  not 
give  consistent  probabilities  for  all  subsets  within  the  region.  We  conment 
on  this  briefly  in  Section  7.0.  For  an  M  x  N  rectangular  lattice,  the  joint 
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density  function  is  given  in  (8).  For  simplicity,  we  assume  that  Y  (Y=Y-U(k)) 
represents  the  picture  function  minus  its  mean.  Y  is  a  vector  with  components 
yij-yij~u(k)  that  are  arranged  in  raster  scan  order. 


p(Y)  =  (2iro2)-(MN>/2|Bli5  exp{-Yc  B  Y/2a2} 


As  M, N  •»,  |b|  ,  the  determinant  of  B,  is  asymptotically  equal  to  the 
determinant  resulting  from  the  use  of  a  torus  structure  [21] .  (This  equality  is  in 
the  sense  that  the  difference  of  the  two  determinants  divided  by  either  one  of  them 

goes  to  0.)  This  occurs  because  as  M,N  ■+•  •»,  the  contribution  to  p(Y)  of 

the  y^j  near  the  window  border  becomes  unimportant;  hence  the  choice  of 
boundary  condition  becomes  unimportant.  For  the  torus  structure,  |  b|  takes 
the  simple  form  [21] 


M-l  N-l 

b|-  it  n  [i  -  2  l  eta 
i=0  j=0  _2<i,m<2 


'(r  + 


Because  of  the  raster  scan,  the  MN  x  MN  positive  definite  symmetric  matrix  B 
can  be  decomposed  as 
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®1>  b2  an<^  ®3  are  (M  X  M)  matrices.  The  matrix  represents  the  spatial 
interaction  of  a  row  (column)  with  itself,  while  &2  is  the  spatial  interaction 
matrix  between  two  adjacent  rows  (columns),  and  B3  is  the  spatial  interaction 
matrix  between  two  rows  (columns)  separated  by  one  intervening  row  (column). 
Then  for  a  2nd  order  model  the  quadratic  in  (8)  is  expanded  as 
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M  N-l 


ycby  =  Z  Z  ~  2Bn  Z  Z  ~  2^12  Z  Z  yi-»yi  j+i 

i-l  j-l  3  i-1  j-1  3  *3  i-1  j«l  13  1,3 
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~  2621  Ji  j-i  yijyi+l.j+l  “  2S22  yijyi+l,j-l 


We  proceed  now  to  show  how  to  recursively  compute  the  conditional  and 
unconditional  likelihood  functions  associated  with  the  16  partitions  of  a 
boundary  or  first  neighbor  block. 


Result  1 


Suppose  an  N  X  N  data  block  Y  is  divided  into  4  quadrants.  Let  Yj  be  the 
data  vector  in  quadrant  j  (j  =  1,2, 3,4)  where  these  quadrants  are  NW 
(Northwest),  NE,  SW  and  SE,  respectively.  Then  the  N^xN^  B  matrix  associated 
with  the  data  block  Y  can  be  partitioned  into 
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where  Bj(j  =  1,2, 3, 4)  is  the  interaction  matrix  associated  with  Yj  and  is 
given  in  (8b)  for  M  =  N  =  N/2.  B^j  is  the  interaction  matrix  between 
quadrants  i  and  j.  Note  that  the  Bj  are  the  same  for  all  quadrants,  and  that 
®12  =  ®34  and  Bl3  =  B24.  The  joint  likelihood  of  the  N  X  N  data  block  Y  can 
then  be  expressed  as 

2  4 

p(Y)  -  <2iroVN  /2|B|1/2exp{-(l/2o2)[  \  y!b  Y  ♦  2  J  Y?B.  Y.]>  (10) 

j-1  333  i,j  13  3 

l<i<j<4 

Proof  follows  from  the  preceding  decomposition  of  B. 

Let  Ity(  j  )  be  the  within-block  interaction  term  for  data  block  Yj. 

Iy( j  )  =  Y?BjYj.  Let  lg(  i  , j  )  be  the  between-block  interaction  term  for 
blocks  Yj[  and  Yj.  IB(  i  ,  j  )  *  Y^B^Yj  -  Y?B?  .Y^.  Hence 

An  p(Y)  =  (l/2)in|B|  -  (N2/2)£n  2*o2  -  l/2o2[  {  Iu(i)  +  2  £  I_(i,j)] 

i-1  i.j 

l<i<j<4 

The  next  result  gives  the  form  of  the  likelihood  of  a  block  X  conditioned  on 
the  surrounding  block(s)  Y. 

Result  2 

The  conditional  likelihood  of  a  data  block  X  given  surrounding  data 
blocks  Y  and  the  free  boundary  condition  outside  blocks  X  and  Y  is 

P(X|Y)  =  (2™2)Nx/2  iBj**  exp{-(l/2o2)  [XCBXX  +  ZxSi^Y  +  yS^B^B^Y]  }  (11) 

where  Nx  is  the  number  of  pixels  in  block  X. 

Proof:  See  Appendix  A 
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Remark 

As  Hence ,  fo^'^lalrge  N^  and^lrtfltf-oi^le/r 

process  YcB^Bx^Bx^_^an  be  neglectea-'tfnd  we  use  the  approximation 

00  con  fa  compu jfd  exactly  •  Tr  our  exper/me^is  hove  us<°e/ 

£np(  x|y)  *  (5s)£n|Bx|-(Nx/2)£n(2iro2)  -  (l/2a2)  [iw(x)+2IbCX,Y)1 

u»th  good  result*- 

From  Result  2,  it  follows  that  the  conditional  likelihoods  involved  in  the 
decision  mechanism  are  explicitly  expressed  as  a  summation  of  appropriate 
within-block  and  between-block  interactions.  These  terms  can  be  recursively 
computed  from  one  resolution  to  the  next.  For  example,  assume  that  at  stage  n 
the  window  has  M2  blocks.  Let  Ijj(i,j;n)  be  the  within-block  interaction  for 
the  (i,j)  block  at  resolution  n  under  texture  class  c.  Let  Icg( i, j , A ,m;n)  be 
the  between-block  interaction  between  block  (i,j)  and  block  (£,ra)  at 
resolution  n  under  texture  class  c.  (Note  that  increasing  n  means  smaller 
blocks,  with  the  largest  n  specifying  a  block  size  of  one  pixel).  Then  the 
within-block  interactions  at  resolution  (n  -  1)  are  computed  from  I^(.,  . ;  n)  and 
iB^ •»•>•*• ;n)  using  a  very  simple  ring-structure  shown  in  figure  A.  Here  the 
IyC.j.jn)  for  each  block  at  resolution  n  is  represented  by  a  node,  whereas  the 
jb( >n)  between  a  pair  of  blocks  is  represented  by  either  a  dashed  or 
solid  branch.  We  assume  that  the  sets  Iy(.,.;n)  and  Ig( ;n)  for 
the  blocks  at  resolution  n  have  been  computed  and  stored.  We  compute  the  sets 
ly(  • ,  • ,  ;n-l)  and  Ijj(  ;n-l)by  summing  up  the  branches  of  the 

appropriate  solid  rings,  and  the  branches  of  the  appropriate  dashed  rings 
given  at  resolution  n  (see  fig.  A).  Note  that  the  between-block  interaction 
between  2  non-neighboring  blocks  is  zero.  Here  the  neighborhoood  set  is  that 
which  is  defined  by  the  order  of  the  process.  For  a  5th-order  process  the 
only  nonzero  interaction  blocks  that  a  block  (i,j)  of  size  (2  x  2)  or  higher 
has  is  the  between  block  interaction  set  { Ig( i> j > i+k, j+)0  ,  -1  <  k,  l  <  1,  and 
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(k,£)  i-  (0,0)}.  With  special-purpose  hardware  one  can  compute  all  the 
I^(.,.;n-1)  and  l£( ;n-l)  simultaneously  using  (Icy(.,.;n), 

1^( ;n-l)) .  Because  of  this  recursion,  the  hierarchical  algorithm  is 
computationally  attractive. 

3.2.2  Amount  of  Computation  Required  for  Hierarchical  Segmentation  Using 
Stationary  Gaussian  MRF  Textures 

We  can  estimate  the  amount  of  required  computation  for  a  64  x  64  window 
and  2nd-order  model  as  follows.  At  the  2x2  block  level,  14  multiplications 
and  9  additions  are  required  for  computing  the  within-block  interaction  term 
for  each  block,  and  there  are  (32)2  Such  blocks.  Hence  a  total  of  (32)2  x  14 
multiplications  and  (32)2  x  9  additions  are  required.  For  the  between-block 
interaction  of  a  block  with  any  one  of  its  neighbors,  7  multiplications  and  3 
additions  are  needed  for  the  block.  The  total  number  of  multiplications  for 
the  between-block  interactions  is  7[(2x29x30+2x29x29)+4x30+4x2x29+29+4x3]  = 
7x3786  multiplications  and  3[(2x29x30+2x29x29)+4x30+4x2x29+4x3]  =  3x3786 
additions.  Hence,  at  the  (2  x  2)  block  resolution,  (32)2x14+7x3786  =  40,838 
and  3x3786  =  20,574  additions  are  used.  For  resolution  (n-2),  each  block  is 

(4  x  4).  Using  the  ring  structure,  9  additions  and  one  multiplication  are 
needed  for  the  within-block  interaction  for  each  block,  hence  a  total  of  (16)2 
multiplications  and  (16)2  x  9  additions.  For  the  between-block  interaction 
between  two  (4  x  4)  blocks,  we  need  1  multiplication  and  6  additions,  hence  a 
total  of  [(2x13x14+2x13x13) +( 14x4+2x13x4) +4x3) ]  =  882  multiplications  and  6x882 
additions,  etc.... 

Total  number  of  multiplications  for  all  the  I  is  14,677. 

Total  number  of  additions  for  all  the  Iy(  •  >  •  5  •  Hs  9556. 

Total  number  of  multiplications  for  all  the  Ig  (.,.,.,.;.)  is  27,580. 

Total  number  of  additions  for  all  the  Ig( )  is  27,042. 


.•>, 


sd 
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Without  this  ring  structure,  the  total  number  of  multiplications  for  ITI(.,.;.)  is  -’vv-'l 

w 

107 ,902  multiplications  and  for  Ig( )  is  44,020  —  hence,  a  total  of  >-.•  ~1‘; . 

151,922  multiplications  compared  to  42,257  multiplications.  Furthermore,  if  the  \ 
"within"  and  the  "between"  block  structures  that  we  developed  here  are  not  used, 
but  rather  p(Y,S)  is  computed  from  scratch  for  each  possible  partition  in  the  VVV 

hierarchical  algorithm,  then  the  number  of  multiplication  equivalent  operations  ■/-V-V- 

for  segmentation  is  at  least  fifteen  to  twenty  times  the  42,257  required  for  our  '-’-V-V-* 


algorithm. 


v.  .1 


3.2.3  Amount  of  Required  Computation  with  a  Parallel  Architecture  V-V-‘ 

The  bulk  of  the  computation  in  3.2.2  could  be  carried  in  parallel  with  a 

t  » 

simple  special  purpose  architecture.  Here  the  overall  structure  is  such  that  ■ 

each  block  at  a  given  resolution  (say  (n-l)st)  is  assigned  nine  processors.  The 
first  one,  called  the  within-processor,  computes  the  l^(i,j;n-l)  term  for  the 

_ _ m_ 

(i,j)  block  at  the  (n-l)st  resolution.  It  has  as  inputs  the  elements  of  the 

<  -  *  V  "  _*  "J 

appropriate  solid  ring  at  resolution  n.  An  example  is  shown  in  Fig  4.  The  V 

remaining  eight  processors  are  called  the  between-processors.  Since  each  block  V 

>-  m 

of  size  (2x2)  or  higher  has  eight  neighbors  as  shown  in  Fig  5  for  a  MRF  of  2nd 
order  through  fifth  order,  each  between  processor  has  the  task  of  computing  an 

_v_\ 

Ig( i,  j ,  • ,  •  jn-l)  term  for  the  interaction  between  block  (i,j)  and  one  of  its  8 

m 

neighbors.  Each  between-processor  has  as  inputs  the  elements  that  constitute  the  U_-/J 

v;Vv;j 

appropriate  dashed  ring  at  resolution  n.  Again  an  example  is  shown  in  Fig.  4.  ‘ -V-V-'.'-i 

< 

At  the  highest  resolution  (i.e.  at  2x2  block  level)  the  total  number  of  |  .-V V".  -| 

multiplications,  and  additions  is  14  and  9,  respectively,  for  the  within-processor , .'v'v'v'-] 
and  a  maximum  of  4  for  the  between  processor  for  a  2nd  order  MRF  (for  a  5th  order  V-j 

that  number  will  be  12).  For  lower  resolutions  only  1  multiplication  and  9  f  --  -1- 1' 


additions  are  needed  for  Iy(.,.;.).  A  maximum  of  1  multiplication  and  4 
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additions  are  required  for  Ig( .  The  total  computation  for 
{lw(. is  14  +  log  (N-l)  multiplications  and  9  logN  additions;  and  a 

maximum  of  4+log(N-l)  multiplications  and  41ogN  additions  for  Ig( . An 

upper  bound  for  the  required  computation  is  then  14+log(N-l)  multiplications 
and  91ogN  additions.  Note  that 
except  at  the  2x2  block  level ,  the 
processors  consist  mainly  of  adders 
and  are  hence  structurally  simple! 


Figure  5 


3.3.0  Pseudo -Likelihood 

Let  Y£j  denote  the  vector  Y  less 

a 

B 

a 

B 

a 

B 

the  component  Yi j •  Consider 

8 

a 

B 

a 

B 

a 

Yij|Yij  (i-e.,  y£j  conditioned  on  its 

a 

B 

a 

B 

a 

B 

surround),  and  suppose  the  window  is 

just  one  texture  type.  For 

B 

a 

B 

a 

B 

a 

simplicity,  assume  the  texture  model  Figure  6 

is  a  1st  order  Gaussian  MRF.  If  the  y^j  are  limited  to  one  code,  i,e.,  the 
pixels  marked  a  or  those  marked  0  in  Fig.  6,  the  conditioned  variables  y £ j J Y£ j 
will  be  statistically  independent.  Hence, 


<i,j> 


(12) 


e  one  code 

is  the  joint  conditional  likelihood  of  picture  function  values  for  pixels  in  one 
code.  Equivalently,  (12)  is  the  joint  likelihood  of  the  statistically 
independent  residuals  in  the  prediction  of  the  y£j  in  a  code  given  the 
surrounding  picture  function  values.  By  the  pseudo- likelihood  of  the  picture 
function  in  a  region,  we  mean  the  product  of  the  conditional  likelihoods  of  the 
picture  function  at  all  points  in  the  region,  e.g.,  (12a).  Note  that  the  pseudo¬ 
likelihood  is  not  a  true  likelihood;  rather  it  is  the  product  of  joint 


conditional  likelihoods,  each  joint  conditional  likelihood  being  for  one  code  in 
the  region. 


(12=) 

e  region 


Let  the  window  be  NXN.  Denote  by 

P<yijlYij»*>  (13) 

the  conditional  p.d.f.  of  yjj  given  its  surround  when  the  p.d.f.  parameters  are 
given  by  the  vector  <J>.  Then  it  is  easy  to  show  that  for  the  field  y  having 
parameter  vector  with  j*  4>, 


P{iim[  n  p(y,.|Y  )/p(y  |Y  *)]  >  1}  =  1  (14) 

(i,j)  t  1J 

That  is,  the  ratio  of  the  conditional  likelihoods  of  the  picture  function  values 
in  a  code  becomes  greater  than  1  for  almost  all  windows  as  N  ■+•  ».  Hence,  this 
conditional  likelihood  ratio  is  an  effective  texture-type  classifier . Furthermore , 
it  is  trivial  to  show  (Appendix  B)  (also  [ 2 1 } )  that  (14)  holds  if  all  pixels  in  a 
window  are  used,  and  better  performance  is  to  be  expected  then.  We  suggest  using 
this  pseudo-  likelihood  function  as  a  performance  functional  for  segmenting  MRF's 
when  the  pseudo-likelihood  estimated  parameters  for  the  Gaussian  MRF  are  those 
for  a  nonstationary  process. 


3.3.1  On  the  Use  of  the  Pseudo-Likelihood  Function  and  Maximum  Pseudo-Likehood 
Parameter  Estimates 

Though  we  successfully  perform  adaptive  maximum- likelihood  textured- image 
segmentation,  two  problems  are  encountered.  The  first  is  the  computation  of 
determinants  such  as  those  in  (5a)  for  irregularly  shaped  regions.  Approximations 
are  necessary  here.  In  our  hierarchical  algorithm  we  use  (8a)  for  computing 
| B |  .  The  approximation  is  satisfactory  here  because  the  algorithm  considers  only 


rectangular  regions,  square  regions,  or  a  combination  of  both  as  shown  in  Fig.  7 
For  the  latter  case,  we  use  the 
approximation  |b|  *  |b  ||b  |.  For 

an  arbitrary  shaped  region  the 
approximation  in  (8a)  may  not  be 
appropriate.  The  second  more  serious  problem  is  that  computation- intensive 
maximum  likelihood  parameter  estimation  is  necessary  if  the  MRF  models  used  are 
constrained  to  be  those  for  stationary  processes.  Using  the  asymptotic 
expression  for  the  likelihood  (5a),  good  parameter  estimates  can  be  obtained  for 
a  stationary  Gaussian  MRF  model  even  when  the  data  is  somewhat  nonstationary. 
However,  because  of  the  large  amount  of  required  computation,  online  adaptive 
segmentation  may  not  be  possible.  On  the  other  hand,  the  maximum  pseudo¬ 
likelihood  parameter  estimates  for  the  stationary  MRF  are  asymptotically 

consistent,  efficient,  and  computationally  simple  [21]  -  even  for  highly 

irregularly  shaped  regions.  (However;  there  is  no  guarantee  that  the  estimates 
based  on  finite  data  sets  will  be  those  for  a  stationary  MRF,  nor  even  those  fr_ 
a  Gaussian  MRF,  i.e.,  the  estimated  parameters  may  result  in  a  determinant  in 
(5a)  that  is  not  positive  semi-definite.)  The  pseudo- likelihood  function  can 
also  be  used  to  estimate  parameters  for  nonstationary  Gaussian  MRF's,  or  for 
MRF's  that  are  not  Gaussian.  Fortunately,  we  have  found  generally  that  these 
pseudo-likelihood  estimates  of  the  parameters  in  (5)  can  be  used  in  a  pseudo- 
likelihood  performance  functional  (Sec.  3.3.0,  6.0)  that  is  effective  for  both 
texture  recognition  and  hierarchical  textured  image  segmentation.  Hence,  when 
the  pseudo-likelihood  estimates  for  the  g£j  result  in  positive-semidef inite  B's 
for  (5a),  our  hierarchical  maximum-likelihood  segmentation  can  be  used,  and  when 
this  estimated  B  is  not  positive-semidef inite,  maximum  pseudo-likelihood 
segmentation  can  be  used.  One  other  option  for  hierarchical  segmentation  is  the 


-  23 


use  of  the  likelihood  performance  functional  -  but  neglecting  the  determinant 

of  B  in  (5a)  when  the  estimated  B  is  not  pos itive-semidef inite.  Experiments 
using  these  three  performance  functionals  are  described  in  Section  6.0. 


3.3.2  Amount  of  Computation  Required  for  Hierarchical  Segmentation  When  Using 
Nonstationary  MRF  Textures 


Similar  to  the  likelihood  function  of  a  Gaussian  MRF,  the  quadratic  of  the 


pseudo-likelihood  function  can  be  expressed  as  sums  of  data  covariances.  The 
neighborhood  structure  is  larger,  however.  This  can  be  seen  by  considering  a 
1st  order  Gaussian  MRF.  The  quadratic  in  the  conditional  function  p(yij|Yij), 
given  by 


[yij  -  3n(yi-ifj  +  yi+i,j)  -  ei2(yi,j-i  +  yi, j+i)]2/2o2, 

involves  covariances  (interaction  terms)  encountered  in  a  3rd  order  MRF  process 
(e.g.  yi-i^j  yi+l,jj  yi-1, j  Yi,j+1>  etc.)  The  pseudo-likelihood  function 
includes  covariances  in  the  621,  6221  ^31>  B32  directions  which  weren't  present 
in  the  likelihood  function.  The  quadratic  function  in  the  pseudo-likelihood  for 
a  1st  order  MRF  can  be  shown  to  be 


C°°  A  yij  +  CU  A  yiJyi+1>j  +  C12  ^  yijyi, j  +  1  +  C21  l  yijyi+l,j-M 

+  c22  [  yijyi+i,j-i  +  C31  l  yij7i+2,j  +  =32  l  yij7i,j+2 

i.j  i,j  i,j 

+  correction. 


Hence,  a  similar  ring-structure  is  appropriate  to  evaluate  the  (l^(.,.;.)}  and 
{ Ig( }  sets.  The  amount  of  computation  is  higher  than  that  for  the 
likelihood  function,  as  a  larger  number  of  covariance  functions  are  involved,  and 
there  is  a  correction  terra  which  is  nonexistent  when  the  likelihood  function  is 


used. 


The  ring  structure  is  fundamental  to  MRF's  in  genral,  not  just  the  Gaussian. 
For  the  class  of  auto-models  [15] ,  the  most  general  form  p(Y)  can  take  in  order 


to  give  a  valid  probability  sturcture  that  represents  a  MRF  is  given  by 

p(Y)  =  h  exp  Q(Y)  (15) 

where  h,  the  partition  function,  is  a  normalizing  constant  in  order  that  p(Y) 
have  unit  area.  Q(Y)  is  known  as  a  Gibbs  potential  and  is  shown  [15]  to  have  the 
form 


"  I  Yr  Gr  <Yr)  +  l  Gr_v(yr,y  )  y  y 
r  r,v  ■  r  v 


(16) 


This  has  the  same  form  as  the  quadratic  for  the  Gaussian  MRF  and  therefore 
the  compuation  involved  in  the  hierarchical  segmentation  can  be  carried  on 
through  a  similar  ring-structure.  This  fact  makes  the  hierarchical  segmentation 
computationally  attractive  and  highly  parallelizable  for  any  random  field  for 
which  the  Gibbs  potential  has  been  determined. 


1  •  Segmentation  Using  Auto-Binary  Fields  for  Region  Geometry  and  a  Parallel 
Iterative  Relaxation-Type  Algorithm 

In  Section  3,  the  segmentation  of  an  image  into  regions  each  of  which 
contains  a  single  Markovian  texture  field  was  considered.  The  hierarchical 
algorithm  was  based  solely  on  models  for  the  observable  texture  fields,  and  did 
not  incorporate  any  information  about  image  geometry  (i.e.,  shapes  of  the 
regions,  their  sizes,  etc.).  Because  of  the  lack  of  information  about  image- 
geometry,  boundaries  between  segmented  regions  in  noisy  images  are  often  jagged 
and  must  be  smoothed.  A  noncausal  auto-binary  field  could  be  used  towards  that 
end.  The  auto-binary  field  is  used  to  model  region  geometry  or  boundary  shape. 
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For  the  case  of  an  object  that  is  assumed  to  have  a  boundary  that  is  smooth  and 
slowly  curving,  an  appropriate  field  will  be  such  that  the  conditional 
probability  of  a  point  S£j  given  its  neighbors  will  be  low  for  the  case  where  a 
sharp  increase  in  the  boundary  slope  would  result.  Alternatively,  for  a  region 
with  high  curvatures ,  we  can  choose  the  parameters  for  the  auto-binary  field  so 
as  to  reflect  locally  this  property.  A  parallel,  iterative  relaxation  type 
algorithm  arises  naturally  here,  and  is  implemented  to  maximize  the  conditional 
likelihood  p(S|Y),  or  equivalently,  the  joint  likelihood  p(Y|s)p(S).  A 
sequential  algorithm  guaranteed  to  find  a  local  maximum  is  as  follows.  Suppose 
at  the  nth  iteration,  the  estimated  binary  field  is  Sn  with  value  s^at  pixel 
(i,j).  Note  that  S£j  takes  the  values  (0,1)  corresponding  to  texture  data  field 
classes  c  =  1,2  respectively.  The  different  steps  ares 

1.  Choose  any  pixel  (i,j). 

2.  Keep  as  is  or  change  it,  whichever  maximizes  p(S,Y).  This  provides 

a  new  estimate  Sn+1.  Hence 

p(Y  |  S  j ,  S£j  =  k)  P(Sij ,  s  £  j  =  k),  k  =  0,  l,must  be  computed. 

To  carry  out  these  computations  we  use 
p(Sij,Sij  )  =  p(s£j|Sjj)  p(S£j)  . 

Since  p(S£j)  is  not  a  function  of  s:j  ,  only  the  conditional  likelihoods 
p (s £ j  =  kjSjj)  ,  k  =  0,  1,  must  be  computed.  These  are  given  in  (6)  and 
are  simple  to  compute. 

3.  For  the  other  computation,  use 
P ( Y | Sfj i  s  £  j  =  k)  = 

P  (y  i  j  {  Yij  ,  S£j,s£j  =  k)  •  p(Y£j|  Sj.j,S£j  =  k)  ,  k  =  0,1. 

Since  the  second  factor  is  not  a  function  of  s£j  ,  all  that  need  be  computed 
is  the  first  factor  for  k  ■  0,1. 
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0  depending  on  whether  the  ratio  (17)  is 

.  .  n+1  n  v  n+l  .  n  (17) 

P^y  i j  I  Yij»sij  ■  sij)  P(sij  *  1 1  sn  ) / 

P<yij| Yij>sij  *  P(8ij  “ 

If  some  of  the  yi>m  in  the  neighborhood  system  of  yjj  are  missing  in 

Eq.  (17),  they  are  set  equal  to  0. 

To  carry  out  the  next  iteration,  choose  a  new  pixel  ( i * , j  * )  and  repeat. 

The  parallel  version  of  the  above  algorithm  is  based  on  dividing  the  entire 
data  window  into  different  subsets  called  codes.  For  example,  for  a  1st  order 
auto-binary  model  the  image  data  could  be  partitioned  into  two  sets  or  codes  as 
shown  in  Figure  6-  For  any  specific  pixel  a,  the  conditional  mean  of  the  image 
function  at  that  pixel  a  given  the  image  at  all  other  pixels  will  only  depend  on 
the  4  surrounding  8  pixels  and  not  on  any  a  pixels.  Hence  the  algorithm 
described  above  could  be  applied  to  all  the  a  pixels  simultaneously.  The  next 
iteration  would  then  involve  working  with  ail  the  6  pixels.  This  algorithm  will 
converge  with  fewer  passes  through  the  image  than  will  the  first  algorithm 
described. 

Note  that  the  joint  likelihood  function  is  a  multimodal  function  of  S.  The 
iterative  algorithm  is  guaranteed  to  find  a  local  maximum  of  the  joint  likelihood 
function  in  a  finite  number  of  iterations.  If  the  binary  field  is  used  within  a 
window  as  a  smoother  to  the  boundary  estimated  by  the  hierarchical  algorithm, 
then  only  pixels  adjacent  to  the  hypothesized  boundary  at  that  time  need  be 
tested,  rather  than  pixels  interior  to  the  hypothesized  regions.  In  this  sense, 
the  algorithm  would  be  related  to  the  Ripple  Filter  introduced  by  Cooper,  et  al. 

[1,3]. 


4.  Hence,  choose  sj+l  =  1  or  s£+l  " 
greater  than  or  less  than  1. 


4.2  Proof  that  the  Algorithm  Converges 


At  the  end  of  each  iteration  the  joint  likelihood  p(S,Y)  has  increased  or 


remained  unchanged.  For  an  N  x  N  window,  the  joint  likelihood,  as  a  function  of 
S,  can  take  at  most  2^2  values.  Because  of  this  and  the  fact  that  the  algorithm 


is  sure  that  Sn+1  differs  from  Sn  only  if  the  change  strictly  increases  the  value 


of  the  joint  likelihoood  function,  it  follows  that  the  algorithm  is  guaranteed  to 


find  a  local  maximum  of  the  joint  likelihood  function  in  a  finite  number  of 


iterations . 


4.3  Example  of  the  Form  of  the  Classification  Computation 


To  illustrate  explicitly  the  computations  made  during  each  iteration  of  the 


algorithm,  suppose  p(yij|y£m  i-n  region  k)  is 


N(p(k)  +  l  8r_v[yv  -  p (k) ] ,  o2(k)) 
veD 

P 


with  r  =  ( i, j) ,  v  =  (£,m). 


Then  the  pixel  texture-type  classification  function  in  its  simplest  form  is  to 


choose  srn+l  =  1  or  0  at  the  n+1  st  iteration  in  accordance  with: 


If  -£no(  1)  -  [l/2o2(l)]  (yr-p(l)  -  ]>  6 ( 1  > [ y v  -p(l)])2 

veD  r  V 
P 

+  £na(G)  +  [l/2o2(0)]{yr  -  p(0)  -  J  6(0>[yv  -  p(0)]}2 

veD  r  V 

P 

+  a  f  l  br.  van  (18) 

veD 

P 


_  n+1  ,  .  n+1  _ 

>  0,  then  8^  ■  1;  else,  0. 


(This  can  be  expressed  more  compactly  by  using  matrix  notation.) 


The  algorithm  in  Section  4.1  generally  converges  to  a  local  maximum  of 
p(Y,S).  Segmentation  errors  take  the  form  of  classification  errors  in  the 
vicinity  of  the  true  boundary  and/or  the  occurence  of  one  or  more 
incorrectly  classified  sizeable  regions.  Let  sjj  denote  the  value  of  s£j 
produced  by  the  segmentation  algorithm.  s£j  takes  values  of  1  and  0  in 
accordance  with  (18).  It  is  usually  the  case  that  for  many  of  the  pixels  (i,j) 
for  which  S£j  is  incorrect,  the  likelihood  ratio  in  (17)is  close  to  1. 

This  suggests  changing  the  values  of  all  sjj  for  which  the  likelihood  ratio  is 
close  to  1,  and  using  the  resulting  segmentation  as  an  initial  segmentation  for 
rerunning  the  algorithm.  The  algorithm  will  converge  to  some  new  segmentation 
In  almost  all  experiments  that  we  have  run,  S  is  closer  to  the  true  segmentation 

A 

than  is  S.  The  procedure  can  be  repeated  any  number  of  times.  It  is  possible 
for  one  of  these  perturbations  to  result  in  convergence  to  a  less  accurate 
segmentation.  Consequently,  it  is  desirable  to  use  some  measure  of  segmentation 
accuracy  to  determine  when  to  accept  a  limit  partition.  The  modified  algorithm 
is  as  follows. 

1.  After  convergence,  run  the  following  check  at  each  pixel  (i,j). 

1-e±Rijll  +  e  (19) 

(where  Rjj  is  the  ratio  in  (17)). 

2.  If  Rjj  lies  between  the  indicated  bounds,  change  s^j,  otherwise  keep  it 
as  it  is. 

3.  After  making  all  such  changes  throughout  the  image,  use  the  resulting 
segmentation  to  run  the  parallel  iterative  algorithm  until  it  converges 


again. 


(<UJ 


4.  At  each  convergence  S,  compute  the  pseudo  likelihood 
H  p(yijlYij>sij)p(sij|Sij). 

i.j 

(See  Section  3.3.0  for  the  significance  of  this  performance  functional.) 
Repeat  the  algorithm  until  there  is  little  change  in  a  few  successive  pseudo¬ 
likelihoods.  Keep  that  segmentation  for  which  the  pseudo- likelihood  is  a 
maximum.  That  segmentation  is  our  estimate  of  the  true  segmentation.  The 
reason  the  perturbation  algorithm  is  so  effective  is  that  the  segmentation 
resulting  from  the  first  convergent  running  of  the  algorithm  is  usually 
fairly  good. 

Two  changes  in  the  preceding  algorithm  could  be  considered.  The  first  is 
to  let  E  decrease  each  time  a  new  cycle  is  begun.  The  other  is  to  not  change 
the  classification  of  all  s£j  satisfying  (19),  but  rather  to  change  it  with 
some  probability,  perhaps  0.5.  This  makes  the  algorithm  more  analogous  to 
that  used  in  other  applications,  and  to  the  annealing  methods  explored 
recently,  and  earlier. 

4.5  The  Relaxation  Algorithm  Used  for  Seaming 

If  holding  computation  to  an  absolute  minimum  is  not  necessary,  a  simple 
convenient  seaming  procedure  is  to  seam  two  (or  four)  adjacent  windows  by  running 
the  iterative  parallel  algorithm  in  the  union  of  these  windows.  Since  the 
algorithm  begins  with  a  close- to- maximum -likelihood  segmentation,  it  goes  through 
only  a  few  iterations  in  order  to  converge.  If  the  two  (or  four)  windows  contain 
the  same  two  texture  types,  the  hidden  region  modelling  field  used  will  be 
binary.  If  three  texture  types  are  involved,  the  region  modelling  field  should 
be  ternary.  In  a  typical  image  such  as  Fig.  15c,  if  windows  of  modest  size  are 
used,  the  great  majority  contain  at  most  two  texture  types;  a  small  number  of 
windows  contain  three  texture  types . 
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4.6  Segmentation  of  Images  of  Manufactured  Objects 

Smooth  dull  surfaces  such  as  those  of  many  manufactured  objects  result  in 
picture  functions  that  are  spatially  slowly  varying  and  are  well  approximated 
over  much  of  their  extents  by  low  degree  polynomial  functions.  Such  images  are 
easily  segmented  using  the  adaptive  hierarchical  or  iterative  relaxation-type 
algorithms.  One  solution  [  23  ]  is  an  extension  of  the  solution  to  the  problem  of  the 
segmentation  of  an  image  consisting  of  two  or  more  regions  of  constant  but 
unknown  image  intensity  plus  white  Gaussian  noise.  See,  e.g.,  Figs.  14atb  and  the 
associated  discussion  in  Section  6.0.  In  extending  that  solution,  instead  of  the 
mean  value  of  the  picture  function  in  region  k  being  some  a  priori  unknown 
constant  y(k),  it  might  be  a  linear  function  taking  value  y0(k)  +  Y].(k)i  +  Y2<k)j 
at  pixel  (i, j) ,  where  YoOO,  Yi(k),  Y2<k)  are  a  Priori  unknown  constants.  Or 
the  mean  value  function  might  be  a  quadric  function  taking  the  value 
Y0(k)  +  Yi<k)i  +  Y2OOJ  +  Y3<k)ij  +  Y4(k)i2  +  Y5<k)j2  at  pixel  (i,j).  Hence, 
when  using  the  adaptive  hierarchical  segmenter,  the  y^CkVs  could  be  first 
estimated  using  small  image  blocks  and  unsupervised  learning.  These  estimates 
are  used  to  start  the  hierarchical  segmenter,  which  would  adapt  further  during 
the  segmentation  procedure.  Maximum  likelihood  or  Bayesian  parameter  estimation 
is  used  during  this  stage.  Adaptation  can  also  be  done  when  using  the  iterative 
segmenter. 

5.0  Adaptation 

An  algorithm  for  segmenting  textured  images  will  be  of  no  value  unless  it 
has  the  capability  to  estimate  image  model  parameters  during  the  segmentation 
process.  This  assertion  is  based  on  the  observation  that  model  parameters  for  a 
texture  type  will  generally  vary  greatly  from  one  image  to  another.  The  source 
of  this  parameter  variability  is  variation  in  lighting,  scale,  and  simply 
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variability  within  a  broad  class  such  as  tree  foliage,  grass,  earth,  roads,  etc. 
(Within  a  single  image,  there  generally  is  only  slow  spatial  variation  in  the  MRF 
parameters  for  a  single  texture  type.)  Consequently,  either  no  a  priori 
information  is  assumed  concerning  texture-type  model  parameters,  or  a  priori 
distributions  are  assumed  for  these  parameters,  and  these  distributions  are  used 
in  Bayesian-like  parameter  estimation  during  the  segmentation  process.  In  many 
instances  the  approximation  that  model  parameters  are  constant  over  a  texture- 
type  region  is  satisfactory.  In  other  instances,  slow  spatial  variation  must  be 
assumed  for  model  parameters.  We  briefly  comment  on  how  the  first  situation  can 
be  handled  using  our  segmenters.  An  appropriate  approach  for  the  latter 
situation  is  slightly  more  complicated  and  involves  modelling  the  spatial 
parameter  variation  for  a  texture-type  model  by  an  MRF.  Practical  incorporation 
of  adaptation  into  segmentation  algorithms  is  a  substantial  topic  by  itself  and 
is  treated  in  a  subsequent  paper.  At  the  moment,  we  simply  want  to  establish 
feasibility  of  solution. 
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Both  the  iterative-parallel  and  the  hierarchical  segmentation  algorithms  can  "'vl 


function  in  an  adaptive  mode,  but  require  some  prior  texture-type  model  parameter 
information.  Three  possible  means  of  obtaining  this  information  are  as  follows. 

1.  Use  small  windows,  perhaps  16x16  or  32x32  in  a  single  image  to  estimate 
model  parameters.  It  is  assumed  here  that  the  great  majority  of  small 
windows  of  this  size  contain  only  one  texture-type  region.  Effective 
parameter  estimation  is  possible  by  maximizing  the  pseudo-  likelihood 
function  (see  section  3.3.1).  Parameter  estimates  from  small  windows  in  a 
texture-type  region  will  cluster.  Distributions  for  the  various  texture- 
type  model  parameters  can  then  be  obtained. 

2.  Obtain  prior  distributions  for  texture-type  model  parameters  from  previously 
segmented  images. 
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3.  Produce  a  first  crude  segmentation  based  on  gray  level  or  image  features  and 
image  understanding.  Use  model  parameters  or  parameter  distributions 
estimated  from  this  initial  segmentation. 

The  hierarchical  segmenter  is  ideally  suited  to  adaptation  since  large 
region  blocks  are  classified  initially,  and  good  parameter  adaptation  can  be  had 
by  using  these  large  blocks  for  parameter  estimation  as  though  each  large  block 
contained  data  from  one  texture-type.  In  a  typical  large  block,  most  of  the 
pixels  will  be  from  a  region  of  the  same  texture-type.  This  is  the  approach 
exhibited  in  Figs.  9b,  lib.  Parameter  adaptation  realized  by  incrementally 
updating  parameter  estimates  following  each  iteration  of  the  iterative 
segmentation  algorithm  is  discussed  in  a  forthcoming  paper. 

6.0  Experiments 

The  segmentation  algorithms  have  been  run  on  visible  light,  infrared,  and 
artificially  generated  data.  Generally,  all  of  our  algorithms  work  extremely 
well  on  stationary  artificially  generated  MRF*s,  almost  as  well  on  stationary 
natural  data,  and  less  well  but  good  on  nonstationary  natural  data.  The 
following  examples  illustrate  these  cases.  Fig.  8a  consists  of  two  constant  gray 
level  squares  in  a  background  of  some  other  constant  gray  level,  plus  white 
Gaussian  noise.  The  signai-to-noise  ratio  (the  difference  in  average  values  in 
the  squares  and  background  divided  by  the  noise  standard  deviation)  is  1. 
Segmentation  by  the  iterative  segmenter  is  carried  out  using  an  isotropic  binary 
field  having  parameter  values  a  =  -16.0,  b>>t  =  1.5.  Fig.  8b  is  the  initial 
segmentation  accomplished  by  assigning  a  pixel  to  square  or  background  depending 
on  whether  it  is  greater  than  or  less  than  (robj  +  rback^/2»  respectively.  Here 


r0bj  and  rback  are  the  average  picture  function  values  in  the  object  and  the 


-  33 


background,  respectively.  Following  21  passes  through  the  image,  the 
segmentation  converged  to  is  that  in  Fig.  8C.  Fig.  9  is  the  infrared  image  of  a 
tank  and  background  to  which  has  been  added  white  Gaussian  noise  and  illustrates 
the  importance  of  parameter  adaptation  in  the  segmentation  process.  The  same 
type  of  modelling  used  in  Fig.  8  is  appropriate  here.  The  signal-to-noise  ratio 
is  1.3,  and  a  2nd  order  binary  MRF  is  used.  Fig.  9a  is  the  segmentation  produced 
by  the  hierarchical  segmenter  using  a  ML  (maximum  likelihood)  performance 
functional,  and  incorrect  values  for  r0bj»  rback>  Starting  with  the  same 

model  parameters  as  in  Fig.  9a but  using  the  adaptive  segmenter,  results  in  the 
segmentation  shown  in  Fig.  9b.  Fig.  9c  is  the  result  of  running  this  segmenter 
with  parameters  estimated  using  unsupervised  learning  on  the  window  (see  [2]) 
prior  to  segmentation. 

In  Fig.  10  we  are  seeing  experiments  with  the  iterative-parallel  segmenter 
operating  on  artificially  generated  images.  Two  Gaussian  textured  regions  are 
present,  both  with  the  same  means  and  conditional  variances,  but  otherwise 
different  first  order  MRF's.  There  is  a  sinusoidal  boundary  separating  the  two 
regions.  The  upper  region  has  strong  correlation  in  the  vertical  direction, 
whereas  the  lower  region  has  strong  correlation  in  the  horizontal  direction. 

Model  interaction  parameters  for  the  two  stationary  Gaussian  fields  are 
B.,.  =  .3,  and  -.1  for  the  vertical  and  horizontal  directions  in  the  upper  image, 
and  the  permutation  of  these  for  the  l~wer  image.  A  second  order  binary  S  field 
was  used  with  model  parameters  a  =  -4.6,  b.,.  =  2  in  the  horizontal  and  vertical 
directions  and  0.3  in  the  diagonal  directions.  Note  that  since  the  mean  values 
in  the  two  texture-type  regions  are  the  same,  it  is  impossible  to  do  any 
meaningful  segmentation  based  on  thresholding  the  picture  function.  Spatial 
variation  in  the  picture  function  must  be  exploited.  Fig.  10a  is  the  original 


image. 


-  34 


Fig.  10b  is  the  initial  segmentation  based  on  use  of  the  conditional  likelihood 
of  the  picture  function  at  a  point  given  its  surround  under  each  of  the  two 
texture  hypotheses.  Fig.  10c  is  the  segmentation  resulting  from  the  first 
convergence  of  the  algorithm.  The  segmentation  is  then  perturbed  using  the 
modified  iterative-parallel  segmenter  (sec  4.1),  and  the  algorithm  run  again. 
After  four  such  cycles,  the  segmentation  produced  is  that  in  Fig.  lOd.  Measured 
values  of  the  pseudo  -likelihood  performance  functional  for  the  five  segmentations 
at  the  convergence  were  -1.607e,  -1.612e,  -1.613e,  -1.613e,  -1.614e, 
respectively,  where  e  is  a  constant.  Note  that  the  changes  are  small  because  the 
pseudo- likelihood  function  is  for  the  segmentation  of  a  whole  window,  and  the 
number  of  pixel  classification  changes  from  one  convergence  to  the  next  is  small 
compared  with  the  number  of  pixels  in  the  window.  But  this  performance 
functional  is  useful  for  deciding  on  whether  a  new  convergent  segmentation  is 
better  than  an  earlier  one. 

Again,  Fig.  11  illustrates  the  all  important  adaptive  segmentation.  Fig. 

11a  is  the  segmentation  of  the  image  data  in  Fig.  10,  but  now  using  the 
hierarchical  M.L.  (maximum  likelihood)  segmenter  with  incorrect  model  parameters 
for  the  Gaussian  MRF  texture  model.  Fig.  lib  is  the  result  of  starting  off  with 
the  same  incorrect  parameters,  but  then  running  the  segmenter  in  the  adaptive 
mode.  The  resulting  segmentation  can  now  be  smoothed  using  a  few  passes  of  the 
iterative  segmenter  with  its  binary  region  model.  Hierarchical  segmentation  will 
also  work  well  on  such  stationary  data  it  the  likelihood  function  without  3 
determinant  (see  Eq.  5a),  or  the  pseudo -likelihood  function  is  used  as  a 
performance  functional.  These  various  performance  functionals  will  give  roughly 
the  same  segmentations  down  to  block  sizes  of  8  x  8  and  sometimes  4x4  pixels, 
but  at  the  2x2  pixel  ^solution  use  of  the  maximum  likelihood  performance 
functional  results  in  more  accurate  segmentation.  Hence,  use  of  the  determinant 
of  B  does  make  a  difference!  (see  the  discussion  in  Sec.  7.) 
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Nonstationary  data  provides  the  real  challenge  to  segmentation  algorithms 
based  on  MRF  data  models.  Figs.  12a-12h  illustrate  a  number  of  considerations  in 
segmenting  grass  (upper)  and  earth  (lower)  regions  in  a  portion  of  an  outdoor 
visible  light  scene,  using  the  hierarchical  segmenter.  Because  maximum 
likelihood  estimation  of  model  parameters  is  computationally  costly,  we  have 
experimented  with  the  computationally  less  costly  maximum  pseudo -likelihood  model 
parameter  estimates.  If  the  estimated  parameter  values  result  in  B  matrices 
(Eq.  5a)  that  are  nonpositive  semidef inite,  which  is  the  case  in  Figs.  12d,  12e, 
the  Gaussian  likelihood  function  cannot  be  used  as  a  performance  functional  in 
the  hierarchical  segmenter.  Consequently,  we  consider  three  performance 
functionals,  namely,  the  L  (likelihood),  the  L.W.  (likelihood  without  the  B 
determinant  in  Eq.  5a),  and  the  P.L.  (pseudo  likelihood).  In  Figs.  12a-12e, 
parameter  values  are  estimated  prior  to  segmentation.  Figs.  12a,  12b,  12c  are 
hierarchical  segmentations  using  the  L.  ;  L.W. ,  and  P.L.  performance  functionals, 
respectively,  and  maximum  likelihood  parameter  estimates.  Second  order  Gaussian 

MRF  models  were  used.  In  these  figures,  the  image  is  divided  into  four  windows, 

and  segmenters  are  run  independently  in  each.  The  maximum  likelihood  parameter 

estimates  were  constrained  to  be  those  for  a  s  tat  ionary  Gaussian  model  and  were 

obtained  by  using  asymptotic  methods.  Parameter  estimates  obtained  are  shown  in 

the  table.  Figs.  12d,  12c  are  hierarchical  segmentations  using  P.L.  and  L.W. 
performance  functionals  respectively,  and  pseudo-likelihood  parameter  estimates  . 
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Even  though  the  image  data  is  somewhat  nonstationary,  the  combination  of  maximum 
likelihood  parameter  estimates  and  L.  segmentation  performance  functional  is 
best.  In  Figs.  12c,  12d  and  other  experiments  run  with  nonstationary  image  data, 
the  P.L.  segmentation  performance  functional  results  in  segmentations  that 
reasonably  approximate  the  true  boundary,  but  false  boundaries  are  sometimes 
found  as  well.  For  such  data,  hierarchical  L.W.  segmentation  usually  reasonably 
approximates  the  true  boundary  when  used  with  maximum  likelihood  parameter 
estimates,  but  sometimes  completely  misses  the  boundary,  as  in  Fig.  12e,  when 
used  with  maximum  pseudo  likelihood  parameter  estimates.  The  problem  in  this 
latter  case  may  be  caused  by  abnormally  large  positive  values  for  the  exponent  in 
the  likelihood-without-determinant  for  some  erroneous  segmentation  because  the  B 
matrix  is  non  positive  semidefinite  and  therefore  has  at  least  one  negative 
eigenvalue.  Figs.  12f,  12g  are  the  results  of  beginning  with  the  segmentations 
in  Figs.  12a  and  12d,  respectively,  and  seaming  with  the  iterative  segmenter. 
Notice  that  the  effect  of  the  seaming  is  to  smooth  out  the  estimates  of  the  true 
boundaries  and  to  remove  some  of  the  erroneously  estimated  regions.  Finally, 

Fig.  12h  is  the  classification  of  8  x  8  pixel  blocks  in  the  image  independently, 
using  a  P.L.  classifier,  as  discussed  in  Sec.  3.3.0,  with  maximum  pseudo 
likelihood  parameter  estimates.  Note  that  almost  all  blocks  are  classified 
correctly,  suggesting  that  with  slight  modification  the  hierarchical  P.L. 
segmenter  might  be  iess  prone  to  generating  extraneous  boundaries  when  the  data 
is  difficult  to  distinguish  fl6  x  16  pixel  blocks  are  all  classified 

correctly  by  the  algorithm  for  this  image.)  Figure  13  is  the  hierarchical  L. 
segmentation  of  two  nons  tat  ionary  artificially  generated  textures  with  sinusoidal 
boundary,  based  on  use  of  maximum  likelihood  parameter  estimates.  The 
hierarchical  P.L.  segmentation  based  on  maximum  pseudo- likelihood  parameter 
estimates  was  almost  as  good  (there  were  no  spurious  segmentations),  and  this  was 
true  of  the  other  segmenters  as  well. 


Fig.  14a  is  an  artificially  generated  image  of  a  Lambertian  can  illuminated 
by  a  point  source  at  infinity.  Fig.  14b  is  the  result  of  hierarchical 
segmentation.  Of  importance  here  is  that  the  window  to  be  segmented  consists  of 
portions  of  the  images  of  the  can  top  and  can  side  at  a  location  such  that  the 
image  intensities  of  the  two  surfaces  are  the  same  in  a  small  region  in  the 
center  of  the  image.  In  other  words,  there  is  not  a  strong  intensity 
discontinuity  between  regions!  The  image  model  used  for  the  two  surfaces  is  a 
constant  plus  noise  for  the  can  top,  and  a  quadric  polynomial  plus  noise  for  the 
can  side.  The  segmentation  algorithm  is  a  trivial  extension  of  our  algorithms 
for  segmenting  models  for  which  the  image  intensity  over  a  region  is  constant 
plus  white  Gaussian  noise. 

In  Fig.  15a  we  see  an  interesting  image  in  which  there  is  some  deterministic 
geometric  structure,  largely  in  the  house,  but  where  the  rest  of  the  image  is 
largely  stochastic  texture  structure.  We  chose  six  texture-types  to  work  with, 
the  left  tree  foliage  with  roughly  isotropic  spatial  variation  (denoted  tr^) ?  the 
foliage  of  the  large  tree  on  the  right  with  greater  spatial  correlation  in  the 
vertical  than  in  the  horizontal  direction  (denoted  tr^)  >  house  roof  shingles,  skv, 
road,  and  grass.  A  set  of  MRF  model  parameters  was  estimated  for  each  of  these 
texture-types,  from  a  small  data  window  in  each  case.  Then  these  models  were 
used  to  generate  slightly  larger  windows  of  artificial  images,  and  the  generated 
images  were  inserted  in  the  picture  close  to  the  locations  from  which  the  data 
for  estimating  model  parameters  wis  caken,  Fig.  15b.  The  artificially  generated 

tr^  texture-type  image  is  a  square  window  in  the  upner  left  corner  of  the  picture. 
The  other  artificially  generated  square  window  images  can  be  recognized  in 

different  locations  in  the  picture.  What  is  surprising  is  that  the  models  appear  ( 
to  capture  the  texture  structure  amazingly  well.  All  of  the  models  used  are  2nd 
order  MRF's,  except  for  the  roof  model  which  is  5th  order.  This  higher  order  was 
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necessary  to  capture  the  horizontal  shingle  structure  because  the  angle  involved 
is  different  from  0  degrees  with  respect  to  the  horizontal  axis  for  the  picture. 
Fig.  15c  illustrates  the  effectiveness  of  the  pseudo- likelihood  function  (12a)  as 
a  recognizer  of  the  texture-type  seen  within  a  window.  Window  data 
classifications  were  made  independently  of  one  another  here.  Notice  that 
incorrect  classifications  were  made  in  the  dark  area  in  the  foliage  of  the  left 
tree,  and  much  of  the  sky  region  was  incorrectly  classified.  The  first  set  of 
errors  is  due  to  the  fact  that  the  model  parameters  for  a  texture-type  vary 
(usually  slowly)  within  an  image,  and  this  variation  should  be  incorporated  into 
the  models  used.  The  second  set  of  errors  is  due  to  the  occurrence  of  artifacts 
such  as  the  images  of  telephone  wires  in  the  misclassif ied  windows.  Note  that 
the  bushes  and  a  third  small  tree-foliage  region  on  the  right  have  been 
classified  as  tr  type  regions.  This  is  because  models  were  not  introduced  for 
these  texture-types,  and  the  data  seen  looks  very  much  like  that  in  the  tr^  type 
region.  Finally,  note  that  two  windows  toward  the  right  side  of  the  grass  region 
have  been  classified  as  road  regions.  Though  it  is  difficult  to  tell  from  this 
image,  these  windows  lie  in  a  driveway  region  that  looks  very  much  like  the  road 
image  texture.  More  windows  in  this  region  are  not  classified  as  road  because 
these  other  windows  contain  grass  image  texture  as  well. 
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7.0  Physically  Meaningful  Models,  Mathematical  Correctness,  and  ■  V-V-V-' 

•  »** 
■V>V; 

Computational  Complexity 

We  comment  on  two  important  problems  in  this  section.  ’v['- 

1.  Consider  Eq.  (17).  The  meaning  of  p(yij | Yij,S£j*  *  1,  S£j  )  is  that  y£j  is 

conditioned  only  on  those  yv  in  the  neighbor  set  Dp  in  (5)  for  which  s“  =  1.  All 

other  yv  in  Dp  are  omitted;  equivalently,  they  are  set  to  0.  We  shall  refer  to  tht/.’.-yO/ 

.  *  «  •  • 

yv  that  are  set  to  0  as  constituting  missing  data.  The  same  situation  arises 

with  the  hierarchical  algorithm  where  the  conditional  likelihood  of  a  block  of  '»•  v.'-r.?1 .i 

>  _  m 

data  must  be  computed.  This  conditioning  of  y^:  by  setting  missing  picture  V*! 

function  values  to  0  is  not  physically  meaningful  and  also  leads  to  probabilities  ‘ 

of  segmented  images  that  do  not  constitute  a  consistent  set  of  probabilities. 

(Note,  the  missing  data  problem  also  occurs  at  the  four  sides  of  a  window,  but 

since  this  window  boundary  is  fixed  and  boundary  effects  die  out  away  from  the 

boundary,  significant  harmful  effects  on  segmentation  do  not  occur  here.)  Small 

improvement  in  segmentation  accuracy  can  be  made  by  correcting  the  problem.  The 

solution  is  to  treat  {y;;}  as  though  the  texture-type  1  model  is  used  to  generate  '<.7.;/* 

•m  *,  m 

all  the  data  in  the  window,  and  a  subset  of  the  data  points  is  chosen,  namely,  "V! IS 

those  yv  for  which  sv  =  1.  Then,  the  true  conditional  likelihood 

p(yij  |yv  for  which  sv  =  1,  class  1)  is  computed.  This  conditional  likelihood, 

which  for  reasons  given  is  not  (5),  is  Gaussian.  The  conditional  variance  of  y £ ;  r/,\ 

,  _  M 

is  at  least  as  large  as  0^  (1)  and  lies  between  this  and  the  marginal  variance  of  /j 

r  •  »  r  •.  m  + 1 

y£;  for  class  1.  It  can  be  computed  approximately  for  various  combinations  of  yv  ‘/•V-Vv] 

•  «•  v*  v, 

present  (see  Appendix  C,  result  4).  The  true  conditional  mean  of  y..  can  be 

!•:  ■  r£~ 

simply  computed  iteratively,  as  briefly  described  in  Appendix  C,  result  5. 
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Hence,  with  a  modest  amount  of  extra  computation,  a  more  physically  meaningful 
probability  measure,  which  is  also  consistent  can  be  used  for  image  modelling  for 
the  purpose  of  textured  image  segmentation.  Of  course,  corresponding  results 
apply  when  working  with  S£j^  “  0. 
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The  problem  of  having  a  consistent  probability  measure  can  be  easily  treated 
directly  by  designing  a  single  MRF  that  incorporates  both  a  region  and  a  texture 
model  [5,8].  Upon  applying  that  formulation  to  our  problem,  the  Gibbs  potential 
would  be 

6-  [YrU (1)]*  B(l)  [Yj-UCl]]  ♦  [Yq-UCO)]1  B(0)  [Yq-U(0)]  ♦  S^S.  (20) 

Here,  B(l)  and  B  (0)  are  the  inverse  covariance  matrices  for  the  region  1  and  the 

» 

region  0  texture  data  vectors,  respectively,  and  B  is  a  matrix  having  values  a 
along  the  main  diagonal  and  br_v  elsewhere.  Note  that  the  resulting  p(Y,S)  is 

h  exp  G  (21) 

where  h  does  not  depend  on  the  window  partition  for  which  p(Y,S)  is 

computed.  This  likelihood  is  not  the  same  as  in  (2a)  using  (5a)  to  specify 

p({yv} :  sv=k| class  k) .  The  dependence  of  the  likelihoods  on  the  data  Yj,  Y0  is 

exactly  the  same  in  the  two  cases.  However,  whereas  the  likelihood  which  is 

the  combination  of  2a  and  5a  has  a  multiplicative  factor  | B(k) | that  is  a 

function  of  the  partition  of  the  window,  h  in  (21)  is  not  a  function  of  the 

partition.  The  multiplicative  factor  is  substantial  when  the  associated  texture- 

type  region  comprises  many  pixels.  It  is  o“^(k)  when  Yjc  consists  of  one  pixel. 

It  is  possible  to  include  additional  quantities  in  (20)  that  results  in  a. 

modified  (21)  being  more  like  the  combination  of  (2a)  and  (5a),  but  it  is  not 

clear  that  (21)  can  be  modified  to  be  close.  For  example,  G  can  be  modified  to 

r  _1 

include  £a(sv).  This  produces  a  somewhat  strange  MRF  for  modelling  the  image. 
However,  use  of  this  G  in  an  iterative  algorithm  would  result  in  exactly  the  same 
algorithm  as  in  Eq.  (18). 

2.  A  second  important  problem  is  that  the  process  defined  by  (5)  may  be 
stationary  Gaussian,  nonstationary  Gaussian  or  nonGaussian,  depending  on  the 


Sij's.  For  example,  as  discussed  earlier  B  in  (5a)  must  be  positive  semidefinite 
for  the  MRF  in  an  NxN  window  to  be  Gaussian.  This  restricts  the  Bjj's.  As  N*«®, 
the  Sij's  become  increasingly  restricted,  and  in  the  limit  satisfy  the 
restrictions  for  a  homogeneous  Gaussian  MRF.  If  B  in  (5a)  is  not  positive 
semidefinite,  the  resulting  exponential  still  defines  a  MRF  if  the  exponential 
has  finite  integral.  A  sufficient  condition  for  finiteness  of  the  integral  is 
that  | y £ j |  be  uniformly  bounded  above.  This  will  always  be  the  case  for  picture 
functions  generated  by  real  sensors. 

8.0  Conclusions 

MRF's  appear  to  be  very  powerful  for  modelling  textured  images  for  the 
purpose  of  textured  image  segmentation  and  classification.  New  methods  for 
image  segmentation  were  presented  in  this  paper.  These  algorithms  are  not 
computationally  costly,  and  can  be  realized  using  parallel  computer 
architectures,  thus  permitting  real-time  image  segmentation.  The  algorithms  are 
computationally  simple.  However,  there  are  subtle,  probably  important, 
considerations  to  be  aware  of  in  using  the  approach  presented.  Further  study  of 
these  issues  is  worthwhile.  Fortunately,  the  MRF's  have  an  interesting 
structure,  and  unlocking  all  of  their  secrets  promises  to  be  an  interesting 
experience.  For  the  moment,  the  reader's  attention  is  directed  to  the  following 
important  cons ideraticns . 

1.  Since  the  image  texture  models  used  are  Gaussian  stochastic  processes,  do 
they  provide  any  advantages  over  the  use  of  Fourier  analysis  or  finite  impulse 
response  filtering?  The  answer  is  a  definite  yes!  The  primary  advantage  is  that 
Fourier  analysis  or  finite  impulse  response  filtering  is  applied  over  a 
rectangular  window  of  image  data.  The  window  must  be  large  enough  to  include  at 
least  a  few  spatial  cycles  of  the  picture  function.  Hence,  the  boundary 


estimation  accuracy  achievable  with  this  method  is  poor.  On  the  other  hand, 
segmentation  of  MRF  models  is  very  suitable  to  highly  irregular  boundaries.  In 
the  experiments  we  have  run,  boundary  location  estimation  error  usually  lies 
between  0  and  3  pixels.  Error  analysis  for  the  hierarchical  algorithm  is  treated 
in  a  subsequent  paper  [22] .  A  second  advantage  of  the  MRF  approach  is  that 
additional  image  model  structure  is  easily  incorporated.  In  this  paper,  we 
incorporated  a  texture  region  model.  Boundary  shape  models  can  be  incorporated. 
(We  have  found  the  use  of  boundary  shape  models  to  be  of  great  use  in  earlier 
work  [1,2]).  A  third  advantage  of  the  MRF's  is  that  they  are  ideally  suited  to 
handling  images  where  the  texture  parameters  are  a  priori  partially  unknown  or 
are  spatially  varying.  These  are  the  conditions  usually  encountered.  In  this 
paper,  we  have  briefly  exhibited  an  approach  to  treating  the  former  case.  In  the 
latter  case,  the  spatial  variation  can  be  appreciable.  For  example,  in  Fig. 15 
the  texture  in  the  dark  portion  of  tree  foliage  type  1  is  closer  to  the  typical 
texture  in  the  region  of  tree  foliage  type  2.  Therefore,  estimating  the  region 
occupied  by  a  texture  type  should  be  based  not  on  a  fixed  parameter  model  for  a 
texture-type  picture  function,  but  rather  on  one  where  the  texture  model 
parameters  at  a  point  in  a  texture-type  region  are  close  in  value  to  the  model 
parameter  values  at  nearby  points  in  the  region.  In  [23]  a  MRF  is  used  to  model 
parameter  variation,  and  maximum  likelihood  estimation  of  these  spatially  varying 
parameters  i.3  shown  to  be  an  important  ingredient  of  Bayesian  textured  image 
segmentation.  Though  the  case  of  apriori  fixed  but  partially  unknown  texture 
parameters  is  easy  to  handle  using  the  Fourier  analysis  approach  to  image 
segmentation,  it  is  not  apparent  how  that  approach  could  handle  the  case  of 
spatially  varying  texture  parameters.  The  one  advantage  of  Fourier  Analysis  with 
respect  to  the  MRF's  is  the  ability  to  efficiently  represent  nonparametric 
spectra.  However,  texture  segmentation  is  probably  practical  only  for  simple 
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spectra  having  one  or  two  modes.  The  MRF's  should  be  good  models  for  these,  and 
indeed  work  well  in  practice. 

2.  It  was  shown  that  maximum  likelihood  segmentation  of  images  of  manufactured 
parts  is  possible  with  the  same  algorithms.  Here,  the  picture  function  of  a 
slowly  curving  3-D  surface  is  spatially  slowly  varying.  Two  approaches  were 
taken  to  this  problem.  The  first  was  the  modeling  of  the  image  of  a  surface  as  a 
small  piece  of  a  MRf [13] .  The  MRF  model  parameters  were  treated  as  a  priori. 
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partially  unknown.  The  second  was  to  treat  this  image  as  a  polynomial  with  a 
priori  partially  unknown  parameter  values  plus  white  noise.  In  both  cases,  the 
partially  unknown  parameter  values  were  estimated  during  image  segmentation. 

More  extensive  treatment  of  adaptive  segmentation  using  the  polynomial  model  is 
given  in  a  forthcoming  paper. 

3.  Both  segmentation  algorithms  presented  are  highly  parallel  ,  and  will  easily 
run  in  real  time  on  the  appropriate  architectures  (which  can  be  built  using 
present  technology).  The  iterative  relaxation  algorithm  has  a  very  simple 

structure,  and  is  well  suited  to  use  with  general  MRF  models -  even  those  having 

non-constant  parameters.  Whereas  the  iterative  algorithm  will  generally  make 
only  a  small  number  of  passes  through  an  image,  there  are  cases  where  it  may  make 
many  passes  when  changing  the  classification  of  a  region  from  an  initial 
classification  of  one  texture-type  to  a  final  classification  of  another  texture- 
type.  In  these  situations,  the  algorithm  changes  only  a  few  pixels  at  the  region 
boundary  during  each  pass  through  the  image.  A  way  to  overcome  this  is  to  use 
the  iterative  algorithm  with  2x2  or  4x4  blocks  or  with  blocks  of  decreasing  size. 
Such  a  modification  would  run  much  faster  and  would  be  a  hybrid  of  our 
hierarchical  and  relaxation  algorithms.  The  hierarchical  ripple  filter  [3]  uses 
this  philosophy. 
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The  iterative  algorithm  is  closely  related  to  the  annealing  algorithms,  but 
runs  faster  though  it  may  not  always  be  quite  as  accurate  in  finding  a  globally 
optimum  image  segmentation  if  the  initial  segmentation  used  in  beginning  the 
iterations  is  highly  in  error.  However,  what  is  most  important  is  that  the 
iterative  algorithm  will  always  work  with  nonstationary  MRF  models  or  with 
conditional  Gaussian  models  when  the  parameters  are  such  that  the  picture 
function  is  not  quite  Gaussian  (see  Sec.  (6.0))  ;  the  annealing  algorithms 

usually  will  not  work  under  these  conditions.  But  these  conditions  are  the  ones 
that  are  often  prevalent  in  real  image  data! 

The  hierarchical  algorithm  achieves  huge  computational  savings  through 
certain  divide  and  conquer  type  recursions  (a  logarithmic  computational  cost 
through  use  of  a  new  simple  ring  structure)  and  also  through  organizing  the  bulk 
of  the  computation  as  computation  on  the  data  only.  These  data  computations  can 
then  be  combined  with  model  parameters,  at  small  computational  cost,  to  realize 
texture  segmentation.  The  complete  computational  cost  benefit  of  the  algorithm  is 
realized  only  for  fields  with  spatially  constant  parameters.  However,  the 
hierarchical  algorithm  can  always  be  applied  to  nonconstant-parameter  MRF's,  and 
will  always  be  effective  in  achieving  high  segmentation  accuracy. 

4.  Three  performance  functionals  are  discussed  and  shown  to  be  useful  for  the 
recognition  of  blocks  of  textured  image  or  for  hierarchical  textured-image 
segmentation.  These  are  the  joint  likelihood  function  with  and  without  the 
deter. oinants  of  the  inverse  covariance  matrices,  and  the  joint  pseudo-likelihood 
function.  The  first  functional  usually  produces  the  greatest  accuracy,  whereas 
the  accuracies  of  the  latter  two  are  comparable.  Examples  of  behavior  are  given 
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Appendix  A 


Proof  of  Result  2. 

Under  the  free  boundary  condition, 

p(X,Y)  =  constant •exp{-(l/2o2)[XtBxX  +  2XtBXYY  +  YCBYY] > 

From  this  it  can  be  seen  that  p(X|  Y)  is  Gaussian  with  covariance  matrix  Bx^  a 
mean  vector  “By^Bj^yY.  Hence, 

p(X | Y )  =  |Bxr'S(2Tra2)”Nx/2  *  exp{-(l/2o2)  [(X  +  Bx1BXYY)tBx(X  +  B^B^Y)]) 

=  |Bx|l5(2*o2)'Nx/2  •  exp{-(l/202)[XtBxX  +  2XtBXYY  +  Y^^B^^Y]  } 


Appendix  B 


Theorem 


Consider  C  homogeneous  Gaussian  MRF  texture  models;  the  covariance  matrix 
:‘jr  every  set  of  pixels  in  a  texture-type  region  is  nonsingular.  Let  I,(Y|k) 
denote  the  discriminant  (1/N2)  fcn[  Jl  pCyij | Yjj ,k)] f  k«  1,2,... ,C  ,  for  the  kth 

(i.j) 

c  n 

texture-type  in  an  NxN  pixel  window.  The  texture  recognizer  based  on  choosing 
the  texture  type  having  the  largest  discriminant  incurs  a  probability  of 
raisclassif ication  that  goes  to  0  w.p.l  as  N  +  ». 

Proof 

Assume  that  the  texture-type  under  which  the  data  was  generated  is  kQ. 

(y|k>» 

=-(!l)£n[2Tro2(k)]-  1/N22a2(k)  l  {yiry(k>-  Y  B(k)  [y,  -y(k)]}2  (Bl) 

(i,j)  J 

£  ft  gD 

P 

is  the  sample  mean  of  approximately  N2  identically  distributed,  partially 
correlated  r.v.'s 


-(Js)Hn[2td2(k)3-[l/2o2(k)]{y.  .-p(k)  -  £  6(k)  [y.  -p(k)]}2 

11  U,m) 


r 


Since  {y^}  homogeneous,  so  is  (B2)  for  all  k.  Because  {y^}  is  a 
nonsingular  homogeneous  MRF,  both  its  autocorrelation  function  and  that  of 
(b2)  fall  off  exponentially  for  large  distance  from  the  origin.  Using  this 
and  the  Borel  Cantelli  Lemma  [24] ,  it  is  easy  to  prove  a  strong  law  of 
large  numbers,  specifically  that 

*(Ylk>  E[zi;j  (k)  |kQ]  as  N  -*>  »,  (B 

where  the  expection  is  with  respect  to  the  measure  for  Y. 

Claim: 

Etz^  (kQ)  |kQ]  >  (k)  |kQ]  for  k^.  (B 

This  comes  from  a  standard  information  theory  argument  as  follows. 


E[zi;J(k)|k]  = 


■ 

1 

—  I  Q  n 

p<yijlVko> 

. 

p<yiilYli-ko>  -  p<yillYli-k)' 

*  X.J1 

p(yl3|Yij-ko  > 

’  p(ynlYi3-ko)p(Yijlko)dyljdYl3 

i}  [*»P‘y1JlYirk0)]p(y1Jly1J,k0)p«1J|ko)dyljdYlj  (B 

-  [P<y1jlYij'ko)  -  P<ylJlYirk>]P«ljlk„>dyijdYiJ 


with  equality  for  p(y^  |  Y^  ,k)  -  pCy^  lYij  »ko^  for  a11  yij  ,Yij  ‘  Since 
p(yi3!V  k>  13  an  exponential  function  in  y^  and  ,  because  of  the 
nonsingularity  of  the  process  it  follows  that  there  is  equality  only  if 
k=kQ.  Inequality  (B5)  comes  from  use  of  the  inequality 
f.n(l  +  e)  <  e  for  -»<£<“». 


Because  of  the  preceding  inequality  and  (b3), 
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Z(Y|  k0)/£(Y]  k)  _ ^  a  as  N-+«,  ke{l,2,  . . .  ,C}  and  k  i  k0,  where  a  is  a 

W.P.l 

constant  strictly  greater  than  1.  Thus,  probability  of  correct  classification 
converges  to  1  with  probability  1  as  N  goes  to  infinity. 

The  proof  can  be  extended  to  nonGaussian  MRF's  and  to  some  nonstationary 
ones.  A  simple  procedure  is  to  consider  the  sequence  L^,  where  takes  the 
value  1  if  [  l  p(yijlYij,k0)/p(y^j}Y^j,k)]  <  1,  and  is  0  otherwise. 

(i,  j) 

e  ^ 


Note  that  k  ^  k0  in  the  inequality.  Then  the  idea  is  to  prove  that  as  N-*“, 
takes  the  value  1  only  finitely  often*  But  this  will  be  the  case,  by  the  Borel 

CO 

Cantelli  Lemma,  if  (  £  p{Ln  =  1})<°°  .  Hence,  this  proof  depends  on  showing  for 

N=2 

the  MRF  model  under  consideration  that  the  preceding  sum  of  probabilities  is 


finite. 


where  B,  B^,  are  the  within  interaction  matrices  for  the  whole  window,  W,  and 
Z,  respectively;  and  Byg  is  the  between  interaction  matrix  for  W  and  Z.  See 
Secs.  2.2.1  and  3.2.1. 


2.  P(z|w)  *  NC-b^b^w,  o^b”1) 

(See  Section  3.2.1) 

3.  (i)  p(Z)  N  (0,02[Bz  - 

This  result  follows  from  p(W,Z)  =  p(w|Z)p(Z)  and  use  of  results  1  and  2. 

Note  that  the  covariance  matrix  for  Z  is  "bigger"  than  (Bg)-!,  because  (Bz)-! 
is  the  conditional  covariance  matrix  for  Z  given  that  the  surrounding  neighbors 
have  value  0. 

(ii)  p(Z)  =  cp(W, Z) 

where  c  is  a  suitable  normalizing  constant,  and  W  is  the  value  of  W  that 
maximizes  the  joint  likelihood  p(W,Z).  This  result  is  discussed  in  [12;  pp. 
55,56]  and  a  simple  iterative  relaxation  algorithm  for  computing  W  is  discussed 
in  [13;  Appendix],  and  also  described  as  result  5  in  this  Appendix  of  this  paper 


4.  p(Y|w)*uW(  .  ,  o2[By  -  B^B"1  B^]-1) 


This  result  follows  from  p(X,Y|W)  ^  N(.,B~^)  obtained  from  use  of  result  2,  and 
then  the  application  of  result  2  to  p(X' ,Y')  where  Z'  Is  Z|W.  The  fact  that 
cov[Y|w]  does  not  depend  on  or  makes  sense,  since  the  covariance  matrix 

for  Y  when  Z  is  conditioned  on  the  free  boundary  condition  at  its  border  is 
exactly  the  same  as  when  conditioning  on  a  combination  of  the  free  boundary 
condition  and  on  W  where  W  borders  on  Z. 


5. 


We  describe  a  simple  algorithm  for  computing  E[y|w]  when  W,  X,  Y,  Z  are  as 


updating  the  estimate  of  the  component  of  Z  at  pixel  (i,j)  found  at  the  nth 


n+1 


stage.  Then  Z£j  ,  the  estimate  at  the  n+1  stage,  is  computed  as 


zilL  *  V  +  I  6r-v<tn  -  u) 


veDr 


previously  described.  Note  that  E[Z|w]  =  [(Efxjwpt,  (E [Y |W  ]) t  ,so  thatE[Y|W] 
is  the  lower  vector  in  E[zjw].  From  p(Z,W)  »  p(z|w)p(W),  it  is  seen  that  E[z|w] 
is  the  value  of  Z  for  which  p(Z,W)  is  a  maximum,  i.e.,  it  is  the  maximum 
likelihood  estimate  of  Z.  But  p(Z,W)  is  a  quadric  form  in  Z  with  positive 
semidefinite  matrix.  Assume  the  matrix  is  positive  definite.  Hence,  p(Z,W)  has 
only  one  maximum  as  a  function  of  Z,  and  any  one  of  many  iterative  algorithms  may 
be  used  to  compute  this  value  of  Z.  We  use  a  relaxation  algorithm.  Consider 


-  „  -  j  • .  * 
v'V  *. 


l. 


where  this  is  the  expression  for  the  conditional  mean  in  Eq.  (5),  and  t®  is  a 
component  of  either  Zn  or  W,  depending  on  v.  As  with  the  relaxation  algorithm  in 
Sec.  4.1,  estimates  at  all  locations  in  a  code  can  be  made  simultaneously. 
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